| rfc8881.original.xml | rfc8881.xml | |||
|---|---|---|---|---|
| <?xml version='1.0' encoding='utf-8'?> | ||||
| <!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent"> | ||||
| <rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="std" docName="draft-ietf-nfsv4-rfc5661sesqui-msns-04" number="8881" obsoletes="5661" ipr="pre5378Trust200902" updates="" submissionType="IETF" consensus="true" xml:lang="en" tocInclude="true" tocDepth="2" symRefs="false" sortRefs="false" version="3"> | ||||
| <!-- xml2rfc v2v3 conversion 2.41.0 --> | ||||
| <front> | ||||
| <title abbrev="NFSv4.1 with Namespace Update "> | ||||
| Network File System (NFS) Version 4 Minor Version 1 Protocol | ||||
| </title> | ||||
| <seriesInfo name="RFC" value="8881"/> | ||||
| <author fullname="David Noveck" initials="D." surname="Noveck" role="editor"> | ||||
| <organization abbrev="NetApp">NetApp</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street>1601 Trapelo Road, Suite 16</street> | ||||
| <city>Waltham</city> | ||||
| <region>MA</region> | ||||
| <code>02451</code> | ||||
| <country>United States of America</country> | ||||
| </postal> | ||||
| <phone>+1-781-768-5347</phone> | ||||
| <email>dnoveck@netapp.com</email> | ||||
| </address> | ||||
| </author> | ||||
| <author initials="C." surname="Lever" fullname="Charles Lever"> | ||||
| <organization abbrev="ORACLE"> | ||||
| Oracle Corporation | ||||
| </organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street>1015 Granger Avenue</street> | ||||
| <city>Ann Arbor</city> | ||||
| <region>MI</region> | ||||
| <code>48104</code> | ||||
| <country>United States of America</country> | ||||
| </postal> | ||||
| <phone>+1-248-614-5091</phone> | ||||
| <email>chuck.lever@oracle.com</email> | ||||
| </address> | ||||
| </author> | ||||
| <date month="July" year="2020"/> | ||||
| <area>Transport</area> | ||||
| <workgroup>NFSv4</workgroup> | ||||
| <keyword>example</keyword> | ||||
| <abstract> | ||||
| <t> | ||||
| This document describes the Network File System (NFS) version 4 | ||||
| minor version 1, | ||||
| including features retained from the base protocol (NFS version 4 minor | ||||
| version 0, which is specified in RFC 7530) and protocol | ||||
| extensions made subsequently. The later minor version | ||||
| has no dependencies on NFS version 4 minor version 0, and | ||||
| is considered a separate protocol. | ||||
| </t> | ||||
| <t> | ||||
| This document obsoletes RFC 5661. It substantially revises the treatment | ||||
| of features relating to multi-server namespace, superseding the | ||||
| description of those features appearing in RFC 5661. | ||||
| </t> | ||||
| </abstract> | ||||
| </front> | ||||
| <middle> | ||||
| <section anchor="intro" numbered="true" toc="default"> | ||||
| <name>Introduction</name> | ||||
| <section anchor="intro_the_document" numbered="true" toc="default"> | ||||
| <name>Introduction to This Update</name> | ||||
| <t> | ||||
| Two important features previously defined in minor version 0 but | ||||
| never fully addressed in minor version 1 are trunking, which is the | ||||
| simultaneous use of | ||||
| multiple connections between a client and server, potentially to | ||||
| different network addresses, and Transparent State Migration, which | ||||
| allows a file system to be transferred between servers in a way that | ||||
| provides to the client the ability to maintain its existing locking | ||||
| state across the transfer. | ||||
| </t> | ||||
| <t> | ||||
| The revised description of the NFS version 4 minor version 1 | ||||
| (NFSv4.1) protocol presented in this update is necessary to enable | ||||
| full use of these features together with other multi-server namespace | ||||
| features. This document is in the form of an updated description of | ||||
| the NFSv4.1 protocol previously defined in RFC 5661 | ||||
| <xref target="RFC5661" format="default"/>. | ||||
| RFC 5661 is obsoleted by this document. However, the update has a | ||||
| limited scope and is focused on enabling full use of trunking and | ||||
| Transparent State Migration. The need for these changes is discussed | ||||
| in <xref target="NEED"/>. <xref target="CHG"/> describes the specific changes made to | ||||
| arrive at the current text. | ||||
| </t> | ||||
| <t> | ||||
| This limited-scope update replaces the current NFSv4.1 RFC with the | ||||
| intention of providing an authoritative and complete specification, the | ||||
| motivation for which is discussed in | ||||
| <xref target="I-D.roach-bis-documents" format="default"/>, | ||||
| addressing the issues within the scope of the update. However, it will | ||||
| not address issues that are known but outside of this limited scope | ||||
| as could be expected by a full update of the protocol. Below are some | ||||
| areas that are known to need addressing in a future update of the | ||||
| protocol: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Work needs to be done with regard to RFC 8178 | ||||
| <xref target="RFC8178" format="default"/>, which establishes NFSv4-wide | ||||
| versioning rules. As | ||||
| RFC 5661 is currently inconsistent with | ||||
| that document, changes are needed in order | ||||
| to arrive at a situation in which there | ||||
| would be no need for RFC 8178 to update the NFSv4.1 specification. | ||||
| </li> | ||||
| <li> | ||||
| Work needs to be done with regard to RFC 8434 | ||||
| <xref target="RFC8434" format="default"/>, which establishes the requirements | ||||
| for parallel NFS (pNFS) layout types, which are not clearly defined in | ||||
| RFC 5661. When that | ||||
| work is done and the resulting documents approved, | ||||
| the new NFSv4.1 specification document will provide a clear set | ||||
| of requirements for layout types and a description of the file layout | ||||
| type that conforms to those requirements. Other layout types will | ||||
| have their own specification documents that conform to those | ||||
| requirements as well. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Work needs to be done to address many errata reports relevant to | ||||
| RFC 5661, other than errata report 2006 <xref target="Err2006" format="default"/>, | ||||
| which is addressed in this document. | ||||
| Addressing that report was not deferrable because of the | ||||
| interaction of the changes suggested there | ||||
| and the newly described handling of state and session migration. | ||||
| </t> | ||||
| <t> | ||||
| The errata reports that have been deferred and that will need to | ||||
| be addressed in a later document include reports currently assigned | ||||
| a range of statuses in the errata reporting system, including reports | ||||
| marked Accepted and those marked Hold For Document Update | ||||
| because the change was | ||||
| too minor to address immediately. | ||||
| </t> | ||||
| <t> | ||||
| In addition, there is a set of other reports, including at least one | ||||
| in state Rejected, that will need to be addressed in a later document. | ||||
| This will involve making changes to consensus decisions reflected | ||||
| in RFC 5661, in situations in which the working group has decided that | ||||
| the treatment in RFC 5661 is incorrect and needs to be revised to | ||||
| reflect the working group's new consensus and to ensure compatibility | ||||
| with existing implementations that do not follow the handling | ||||
| described in RFC 5661. | ||||
| </t> | ||||
| <t> | ||||
| Note that it is expected that all such errata reports will remain | ||||
| relevant to implementors and the authors of an eventual rfc5661bis, | ||||
| despite the fact that this document, when approved, | ||||
| will obsolete RFC 5661 <xref target="RFC5661" format="default"/>. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| There is a need for a new approach to the description of | ||||
| internationalization since the current internationalization section | ||||
| (<xref target="internationalization" format="default"/>) has never been | ||||
| implemented and does | ||||
| not meet the needs of the NFSv4 protocol. Possible solutions are | ||||
| to create a new internationalization section modeled on that in | ||||
| <xref target="RFC7530" format="default"/> or to create a new document describing | ||||
| internationalization for all | ||||
| NFSv4 minor versions and reference that document in the RFCs | ||||
| defining both NFSv4.0 and NFSv4.1. | ||||
| </li> | ||||
| <li> | ||||
| There is a need for a revised treatment of security | ||||
| in NFSv4.1. The issues with the existing treatment are discussed in | ||||
| <xref target="SECBAD" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Until the above work is done, there will not be a consistent set of | ||||
| documents that provides a description of the NFSv4.1 protocol, and any | ||||
| full description would involve documents updating other documents | ||||
| within the specification. The updates applied by | ||||
| RFC 8434 <xref target="RFC8434" format="default"/> and RFC 8178 | ||||
| <xref target="RFC8178" format="default"/> | ||||
| to RFC 5661 also apply to this specification, and will apply to | ||||
| any subsequent v4.1 specification until that work is done. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="intro_the_protocol" numbered="true" toc="default"> | ||||
| <name>The NFS Version 4 Minor Version 1 Protocol</name> | ||||
| <t> | ||||
| The NFS version 4 minor version 1 (NFSv4.1) protocol | ||||
| is the second minor version of the NFS version 4 | ||||
| (NFSv4) protocol. The first minor version, NFSv4.0, is | ||||
| now described in RFC 7530 <xref target="RFC7530" format="default"/>. It generally | ||||
| follows the guidelines for minor versioning that are | ||||
| listed in Section <xref target="RFC3530" sectionFormat="bare" section="10"/> | ||||
| of RFC 3530 <xref target="RFC3530" format="default"/>. However, it | ||||
| diverges from guidelines 11 ("a client and server | ||||
| that support minor version X must support minor | ||||
| versions 0 through X-1") and 12 ("no new features may be | ||||
| introduced as mandatory in a minor version"). These | ||||
| divergences are due to the introduction of | ||||
| the sessions model for managing non-idempotent | ||||
| operations and the RECLAIM_COMPLETE operation. | ||||
| These two new features are infrastructural in | ||||
| nature and simplify implementation of existing and | ||||
| other new features. Making them anything but <bcp14>REQUIRED</bcp14> | ||||
| would add undue complexity to protocol definition and | ||||
| implementation. NFSv4.1 accordingly updates the | ||||
| <xref target="minor_versioning" format="default">minor versioning | ||||
| guidelines</xref>. | ||||
| </t> | ||||
| <t> | ||||
| As a minor version, NFSv4.1 is consistent with the overall | ||||
| goals for NFSv4, but extends the protocol so as to | ||||
| better meet those goals, based on experiences with NFSv4.0. | ||||
| In addition, NFSv4.1 has adopted some additional goals, which | ||||
| motivate some of the major extensions in NFSv4.1. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Requirements Language</name> | ||||
| <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL | ||||
| NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>MAY</bcp14>", and | ||||
| "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as described in | ||||
| RFC 2119 <xref target="RFC2119"/>.</t> | ||||
| </section> | ||||
| <section anchor="scope_of_doc" numbered="true" toc="default"> | ||||
| <name>Scope of This Document</name> | ||||
| <t> | ||||
| This document describes the NFSv4.1 protocol. With | ||||
| respect to NFSv4.0, this document does not: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| describe the NFSv4.0 protocol, except where needed | ||||
| to contrast with NFSv4.1. | ||||
| </li> | ||||
| <li> | ||||
| modify the specification of the NFSv4.0 protocol. | ||||
| </li> | ||||
| <li> | ||||
| clarify the NFSv4.0 protocol. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="version4_goals" numbered="true" toc="default"> | ||||
| <name>NFSv4 Goals</name> | ||||
| <t> | ||||
| The NFSv4 protocol is a further revision of the NFS protocol | ||||
| defined already by NFSv3 | ||||
| <xref target="RFC1813" format="default"/>. It retains | ||||
| the essential characteristics of previous versions: easy | ||||
| recovery; independence of transport protocols, operating systems, and | ||||
| file systems; simplicity; and good performance. NFSv4 has the following goals: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| Improved access and good performance on the Internet | ||||
| </t> | ||||
| <t> | ||||
| The protocol is designed to transit firewalls easily, perform well | ||||
| where latency is high and bandwidth is low, and scale to very | ||||
| large numbers of clients per server. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Strong security with negotiation built into the protocol | ||||
| </t> | ||||
| <t> | ||||
| The protocol builds on the work of the ONCRPC working group in | ||||
| supporting the RPCSEC_GSS protocol. Additionally, the | ||||
| NFSv4.1 protocol provides a mechanism to allow clients and | ||||
| servers the ability to negotiate security and require clients and servers to | ||||
| support a minimal set of security schemes. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Good cross-platform interoperability | ||||
| </t> | ||||
| <t> | ||||
| The protocol features a file system model that provides a useful, | ||||
| common set of features that does not unduly favor one file system | ||||
| or operating system over another. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Designed for protocol extensions | ||||
| </t> | ||||
| <t> | ||||
| The protocol is designed to accept standard extensions within a | ||||
| framework that enables and encourages backward compatibility. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="minor_version1_goals" numbered="true" toc="default"> | ||||
| <name>NFSv4.1 Goals</name> | ||||
| <t> | ||||
| NFSv4.1 has the following goals, within the framework | ||||
| established by the overall NFSv4 goals. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| To correct significant structural weaknesses and oversights | ||||
| discovered in the base protocol. | ||||
| </li> | ||||
| <li> | ||||
| To add clarity and specificity to areas left | ||||
| unaddressed or not addressed in sufficient | ||||
| detail in the base protocol. However, as stated | ||||
| in <xref target="scope_of_doc" format="default"/>, it is not | ||||
| a goal to clarify the NFSv4.0 protocol in the | ||||
| NFSv4.1 specification. | ||||
| </li> | ||||
| <li> | ||||
| To add specific features based on experience with the existing | ||||
| protocol and recent industry developments. | ||||
| </li> | ||||
| <li> | ||||
| To provide protocol support to take advantage of clustered | ||||
| server deployments including the ability to provide scalable | ||||
| parallel access to files distributed among multiple servers. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="intro_definitions" numbered="true" toc="default"> | ||||
| <name>General Definitions</name> | ||||
| <t> | ||||
| The following definitions provide an appropriate context for the reader. | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>Byte:</dt> | ||||
| <dd anchor="byte"> | ||||
| In this document, a byte is an octet, i.e., a datum | ||||
| exactly 8 bits in length. | ||||
| </dd> | ||||
| <dt>Client:</dt> | ||||
| <dd anchor="client_def"> | ||||
| <t> | ||||
| The client is the entity that accesses the NFS server's | ||||
| resources. The client may be an application that contains | ||||
| the logic to access the NFS server directly. The client | ||||
| may also be the traditional operating system client that | ||||
| provides remote file system services for a set of applications. | ||||
| </t> | ||||
| <t> | ||||
| A client is uniquely identified by a client owner. | ||||
| </t> | ||||
| <t> | ||||
| With reference to byte-range locking, the client is also the entity that | ||||
| maintains a set of locks on behalf of one or more | ||||
| applications. This client is responsible for crash or | ||||
| failure recovery for those locks it manages. | ||||
| </t> | ||||
| <t> | ||||
| Note that multiple clients may share the same transport and | ||||
| connection and | ||||
| multiple clients may exist on the same network node. | ||||
| </t> | ||||
| </dd> | ||||
| <dt>Client ID:</dt> | ||||
| <dd> | ||||
| The client ID is a 64-bit quantity used as a unique, short-hand reference to | ||||
| a client-supplied verifier and client owner. The server is | ||||
| responsible for supplying the client ID. | ||||
| </dd> | ||||
| <dt>Client Owner:</dt> | ||||
| <dd> | ||||
| The client owner is a unique string, opaque to the server, | ||||
| that identifies a client. Multiple network connections and source | ||||
| network addresses originating from those connections may share | ||||
| a client owner. The server is expected to treat requests | ||||
| from connections with the same client owner as coming from | ||||
| the same client. | ||||
| </dd> | ||||
| <dt>File System:</dt> | ||||
| <dd> | ||||
| The file system is the collection of objects on a server (as | ||||
| identified by the major identifier of a server | ||||
| owner, which is defined later in this section) | ||||
| that share the same fsid attribute (see <xref target="attrdef_fsid" format="default"/>). | ||||
| </dd> | ||||
| <dt>Lease:</dt> | ||||
| <dd> | ||||
| <t> | ||||
| A lease is an interval of time defined by the server for which the | ||||
| client is irrevocably granted locks. At the end of a | ||||
| lease period, locks may be revoked if the lease has not | ||||
| been extended. A lock must be revoked if a conflicting | ||||
| lock has been granted after the lease interval. | ||||
| </t> | ||||
| <t> | ||||
| A server grants a client a single lease for all state. | ||||
| </t> | ||||
| </dd> | ||||
| <dt>Lock:</dt> | ||||
| <dd> | ||||
| The term "lock" is used to refer to byte-range (in UNIX environments, | ||||
| also known as record) | ||||
| locks, share reservations, delegations, or layouts unless | ||||
| specifically stated otherwise. | ||||
| </dd> | ||||
| <dt>Secret State Verifier (SSV):</dt> | ||||
| <dd> | ||||
| The SSV is a unique secret key shared between a client and | ||||
| server. The SSV serves as the secret key for an internal (that | ||||
| is, internal to NFSv4.1) Generic Security Services (GSS) | ||||
| mechanism (the SSV GSS mechanism; | ||||
| see <xref target="ssv_mech" format="default"/>). The SSV GSS mechanism uses the | ||||
| SSV to compute message integrity code (MIC) and Wrap tokens. | ||||
| See <xref target="protect_state_change" format="default"/> for more details on how NFSv4.1 uses | ||||
| the SSV and the SSV GSS mechanism. | ||||
| </dd> | ||||
| <dt>Server:</dt> | ||||
| <dd> | ||||
| The Server is the entity responsible for coordinating | ||||
| client access to a set of file systems and is identified by a server | ||||
| owner. A server can span multiple network addresses. | ||||
| </dd> | ||||
| <dt>Server Owner:</dt> | ||||
| <dd> | ||||
| The server owner identifies the server to the client. | ||||
| The server owner consists of a major identifier and a minor identifier. | ||||
| When the client has two connections each to a peer with the | ||||
| same major identifier, the client assumes that both peers are | ||||
| the same server (the server namespace is the | ||||
| same via each connection) and that | ||||
| lock state is shareable across both connections. When each peer | ||||
| has both the same major and minor identifiers, the client | ||||
| assumes that each connection might be associable with the same session. | ||||
| </dd> | ||||
| <dt>Stable Storage:</dt> | ||||
| <dd> | ||||
| <t> | ||||
| Stable storage is storage from which data stored by | ||||
| an NFSv4.1 server can be recovered without data | ||||
| loss from multiple power failures (including cascading | ||||
| power failures, that is, several power failures in quick | ||||
| succession), operating system failures, and/or hardware | ||||
| failure of components other than the storage medium itself | ||||
| (such as disk, nonvolatile RAM, flash memory, etc.). | ||||
| </t> | ||||
| <t> | ||||
| Some examples of stable storage that are allowable for an | ||||
| NFS server include: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| Media commit of data; that is, the modified data has | ||||
| been successfully written to the disk media, for | ||||
| example, the disk platter. | ||||
| </li> | ||||
| <li> | ||||
| An immediate reply disk drive with battery-backed, | ||||
| on-drive intermediate storage or uninterruptible power | ||||
| system (UPS). | ||||
| </li> | ||||
| <li> | ||||
| Server commit of data with battery-backed intermediate | ||||
| storage and recovery software. | ||||
| </li> | ||||
| <li> | ||||
| Cache commit with uninterruptible power system (UPS) and | ||||
| recovery software. | ||||
| </li> | ||||
| </ol> | ||||
| </dd> | ||||
| <dt>Stateid:</dt> | ||||
| <dd> | ||||
| A stateid is a 128-bit quantity returned by a server that uniquely | ||||
| defines the open and locking states provided by the server | ||||
| for a specific open-owner or lock-owner/open-owner pair | ||||
| for a specific file and type of lock. | ||||
| </dd> | ||||
| <dt>Verifier:</dt> | ||||
| <dd> | ||||
| A verifier is a 64-bit quantity generated by the client that the server | ||||
| can use to determine if the client has restarted and lost | ||||
| all previous lock state. | ||||
| </dd> | ||||
| </dl> | ||||
| </section> | ||||
| <section anchor="feature-overview" numbered="true" toc="default"> | ||||
| <name>Overview of NFSv4.1 Features</name> | ||||
| <t> | ||||
| The major features of | ||||
| the NFSv4.1 protocol will be reviewed in brief. This will be done | ||||
| to provide an appropriate context for both the reader who is familiar | ||||
| with the previous versions of the NFS protocol and the reader | ||||
| who is new to the NFS protocols. For the reader new to the NFS protocols, | ||||
| there is still a set of fundamental knowledge that is expected. | ||||
| The reader should be familiar with the External Data | ||||
| Representation (XDR) and Remote Procedure Call (RPC) protocols | ||||
| as described in <xref target="RFC4506" format="default"/> and <xref target="RFC5531" format="default"/>. | ||||
| A basic knowledge of file systems and distributed file systems is expected as well. | ||||
| </t> | ||||
| <t> | ||||
| In general, this specification of NFSv4.1 will | ||||
| not distinguish those features added in minor version | ||||
| 1 from those present in the base protocol but | ||||
| will treat NFSv4.1 as a unified whole. See <xref target="intro_differences" format="default"/> for a summary of | ||||
| the differences between NFSv4.0 and NFSv4.1. | ||||
| </t> | ||||
| <section anchor="rpc_and_security" numbered="true" toc="default"> | ||||
| <name>RPC and Security</name> | ||||
| <t> | ||||
| As with previous versions of NFS, the External Data Representation | ||||
| (XDR) and Remote Procedure Call (RPC) mechanisms used for the NFSv4.1 protocol are those defined in | ||||
| <xref target="RFC4506" format="default"/> and <xref target="RFC5531" format="default"/>. To | ||||
| meet end-to-end security requirements, the RPCSEC_GSS framework | ||||
| <xref target="RFC2203" format="default"/> is used to extend the basic | ||||
| RPC security. With the | ||||
| use of RPCSEC_GSS, various mechanisms can be provided to offer | ||||
| authentication, integrity, and privacy to the NFSv4 protocol. | ||||
| Kerberos V5 is used as described in | ||||
| <xref target="RFC4121" format="default"/> to provide one | ||||
| security framework. | ||||
| With the use of | ||||
| RPCSEC_GSS, other mechanisms may also be specified and used for NFSv4.1 security. | ||||
| </t> | ||||
| <t> | ||||
| To enable in-band security negotiation, the NFSv4.1 protocol | ||||
| has operations that provide the client a method of | ||||
| querying the server about its policies regarding which security | ||||
| mechanisms must be used for access to the server's file system | ||||
| resources. With this, the client can securely match the security | ||||
| mechanism that meets the policies specified at both the client and | ||||
| server. | ||||
| </t> | ||||
| <t> | ||||
| NFSv4.1 introduces parallel access (see <xref target="parallel_access" format="default"/>), which is | ||||
| called pNFS. | ||||
| The security framework | ||||
| described in this section is | ||||
| significantly modified by the | ||||
| introduction of pNFS (see <xref target="security_considerations_pnfs" format="default"/>), | ||||
| because data access is sometimes not over | ||||
| RPC. The level of significance varies | ||||
| with the storage protocol (see <xref target="storage_protocol" format="default"/>) and can be as low as zero | ||||
| impact (see <xref target="file_security_considerations" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="protocol_structure" numbered="true" toc="default"> | ||||
| <name>Protocol Structure</name> | ||||
| <section anchor="core_protocol" numbered="true" toc="default"> | ||||
| <name>Core Protocol</name> | ||||
| <t> | ||||
| Unlike NFSv3, which used a series of ancillary | ||||
| protocols (e.g., NLM, NSM (Network Status Monitor), MOUNT), within all minor versions | ||||
| of NFSv4 a single RPC protocol is used to make requests to | ||||
| the server. | ||||
| Facilities that had been separate protocols, such | ||||
| as locking, are now integrated within a single unified | ||||
| protocol. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="parallel_access" numbered="true" toc="default"> | ||||
| <name>Parallel Access</name> | ||||
| <t> | ||||
| Minor version 1 supports high-performance data access to a | ||||
| clustered server implementation by enabling a separation of | ||||
| metadata access and data access, with the latter done to | ||||
| multiple servers in parallel. | ||||
| </t> | ||||
| <t> | ||||
| Such parallel data access is controlled by recallable | ||||
| objects known as "layouts", which are integrated into the | ||||
| protocol locking model. Clients direct requests for | ||||
| data access to a set of data servers specified by the | ||||
| layout via a data | ||||
| storage protocol which may be NFSv4.1 or may be another | ||||
| protocol. | ||||
| </t> | ||||
| <t> | ||||
| Because the protocols used for parallel | ||||
| data access are not necessarily | ||||
| RPC-based, the RPC-based security model | ||||
| (<xref target="rpc_and_security" format="default"/>) is | ||||
| obviously impacted (see <xref target="security_considerations_pnfs" format="default"/>). | ||||
| The degree of impact varies with the | ||||
| storage protocol (see <xref target="storage_protocol" format="default"/>) used for | ||||
| data access, and can be as low as zero (see | ||||
| <xref target="file_security_considerations" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="file_system_model" numbered="true" toc="default"> | ||||
| <name>File System Model</name> | ||||
| <t> | ||||
| The general file system | ||||
| model used for the NFSv4.1 protocol | ||||
| is the same as previous versions. The server file system is | ||||
| hierarchical with the regular files contained within being | ||||
| treated as opaque byte | ||||
| streams. In a slight departure, file and directory names are encoded | ||||
| with UTF-8 to deal with the basics of internationalization. | ||||
| </t> | ||||
| <t> | ||||
| The NFSv4.1 protocol does not require a separate | ||||
| protocol to provide for the initial mapping between path | ||||
| name and filehandle. All file systems exported by a server | ||||
| are presented as a tree so that all file systems are reachable | ||||
| from a special per-server global root filehandle. This | ||||
| allows LOOKUP operations to be used to perform functions | ||||
| previously provided by the MOUNT protocol. The server | ||||
| provides any necessary pseudo file systems to bridge any | ||||
| gaps that arise due to unexported gaps between exported | ||||
| file systems. | ||||
| </t> | ||||
| <section anchor="intro_filehandles" numbered="true" toc="default"> | ||||
| <name>Filehandles</name> | ||||
| <t> | ||||
| As in previous versions of the NFS protocol, opaque | ||||
| filehandles are used to identify individual files | ||||
| and directories. Lookup-type and create operations | ||||
| translate file and directory names to | ||||
| filehandles, which are then used to identify objects | ||||
| in subsequent operations. | ||||
| </t> | ||||
| <t> | ||||
| The NFSv4.1 protocol provides support for | ||||
| persistent filehandles, guaranteed to be valid | ||||
| for the lifetime of the file system object designated. | ||||
| In addition, it provides support to servers to provide | ||||
| filehandles with more limited validity guarantees, | ||||
| called volatile filehandles. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="intro_attributes" numbered="true" toc="default"> | ||||
| <name>File Attributes</name> | ||||
| <t> | ||||
| The NFSv4.1 protocol has a rich and extensible | ||||
| file object attribute structure, which is divided | ||||
| into <bcp14>REQUIRED</bcp14>, <bcp14>RECOMMENDED</bcp14>, and named attributes | ||||
| (see <xref target="file_attributes" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| Several (but not all) of the <bcp14>REQUIRED</bcp14> attributes | ||||
| are derived from the attributes of NFSv3 (see | ||||
| the definition of the fattr3 data type in <xref target="RFC1813" format="default"/>). An example of a <bcp14>REQUIRED</bcp14> | ||||
| attribute is the file object's type (<xref target="attrdef_type" format="default"/>) so that regular files | ||||
| can be distinguished from directories (also known | ||||
| as folders in some operating environments) and | ||||
| other types of objects. <bcp14>REQUIRED</bcp14> attributes are | ||||
| discussed in <xref target="mandatory_attributes_intro" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| An example of three <bcp14>RECOMMENDED</bcp14> attributes are | ||||
| acl, sacl, and dacl. These attributes define an | ||||
| Access Control List (ACL) on a file object | ||||
| (<xref target="acl" format="default"/>). An ACL provides | ||||
| directory and file access control beyond the | ||||
| model used in NFSv3. The ACL definition allows | ||||
| for specification of specific sets of permissions | ||||
| for individual users and groups. In addition, | ||||
| ACL inheritance allows propagation of access | ||||
| permissions and restrictions down a directory tree | ||||
| as file system objects are created. <bcp14>RECOMMENDED</bcp14> | ||||
| attributes are discussed in <xref target="recommended_attributes_intro" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| A named attribute is an opaque byte stream that is associated | ||||
| with a directory or file and referred to by a string name. | ||||
| Named attributes are meant to be used by client applications | ||||
| as a method to associate application-specific data with a | ||||
| regular file or directory. NFSv4.1 modifies named attributes | ||||
| relative to NFSv4.0 by tightening the allowed operations in | ||||
| order to prevent the development of non-interoperable | ||||
| implementations. Named attributes are discussed in <xref target="named_attributes_intro" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="PREP-intro" numbered="true" toc="default"> | ||||
| <name>Multi-Server Namespace</name> | ||||
| <t> | ||||
| NFSv4.1 contains a number of features to allow | ||||
| implementation of namespaces that cross server boundaries | ||||
| and that allow and facilitate a nondisruptive transfer of | ||||
| support for individual file systems between servers. They | ||||
| are all based upon attributes that allow one file system to | ||||
| specify alternate, additional, and new location information | ||||
| that specifies how the client may access | ||||
| that file system. | ||||
| </t> | ||||
| <t> | ||||
| These attributes can be used to provide for individual active | ||||
| file systems: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Alternate network addresses to access the | ||||
| current file system instance. | ||||
| </li> | ||||
| <li> | ||||
| The locations of alternate file system instances | ||||
| or replicas to be used in the event that the current | ||||
| file system instance becomes unavailable. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| These file system location | ||||
| attributes may be used together with the concept | ||||
| of absent file systems, in which a position in the server | ||||
| namespace is associated with locations on other servers without | ||||
| there being any corresponding file system instance on the | ||||
| current server. For example, | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| These attributes may be used with absent file systems | ||||
| to implement referrals whereby one server may direct the | ||||
| client to a file system provided by another server. This | ||||
| allows extensive multi-server namespaces to be constructed. | ||||
| </li> | ||||
| <li> | ||||
| These attributes may be provided when a previously | ||||
| present file system becomes absent. This allows | ||||
| nondisruptive migration of file systems to alternate | ||||
| servers. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="intro_locking" numbered="true" toc="default"> | ||||
| <name>Locking Facilities</name> | ||||
| <t> | ||||
| As mentioned previously, NFSv4.1 is a single protocol that | ||||
| includes locking facilities. These locking facilities | ||||
| include support for many types of locks including a number | ||||
| of sorts of recallable locks. Recallable locks such as | ||||
| delegations allow the client to be assured that certain | ||||
| events will not occur so long as that lock is held. When | ||||
| circumstances change, the lock is recalled | ||||
| via a callback request. The assurances provided by | ||||
| delegations allow more extensive caching to be done safely | ||||
| when circumstances allow it. | ||||
| </t> | ||||
| <t> | ||||
| The types of locks are: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Share reservations as established by OPEN operations. | ||||
| </li> | ||||
| <li> | ||||
| Byte-range locks. | ||||
| </li> | ||||
| <li> | ||||
| File delegations, which are recallable locks that assure | ||||
| the holder that inconsistent opens and file changes cannot | ||||
| occur so long as the delegation is held. | ||||
| </li> | ||||
| <li> | ||||
| Directory delegations, which are recallable locks | ||||
| that assure the holder that inconsistent directory | ||||
| modifications cannot occur so long as the delegation | ||||
| is held. | ||||
| </li> | ||||
| <li> | ||||
| Layouts, which are recallable objects that assure the | ||||
| holder that direct access to the file data may be | ||||
| performed directly by the client and that no change | ||||
| to the data's location that is inconsistent with that access | ||||
| may be made so long as the layout is held. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| All locks for a given client are tied together under a | ||||
| single client-wide lease. All requests made on sessions | ||||
| associated with the client renew that lease. When the client's | ||||
| lease | ||||
| is not promptly renewed, the client's locks are subject to revocation. | ||||
| In the event of server restart, clients have the | ||||
| opportunity to safely reclaim their locks within a special | ||||
| grace period. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="intro_differences" numbered="true" toc="default"> | ||||
| <name>Differences from NFSv4.0</name> | ||||
| <t> | ||||
| The following summarizes the major differences between minor version | ||||
| 1 and the base protocol: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Implementation of the sessions model (<xref target="Session" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| Parallel access to data (<xref target="pnfs" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| Addition of the RECLAIM_COMPLETE operation to better structure | ||||
| the lock reclamation process (<xref target="OP_RECLAIM_COMPLETE" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Enhanced delegation support as follows. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Delegations on directories and other | ||||
| file types in addition to regular files (<xref target="OP_GET_DIR_DELEGATION" format="default"/>, <xref target="OP_WANT_DELEGATION" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| Operations to optimize acquisition of recalled | ||||
| or denied delegations (<xref target="OP_WANT_DELEGATION" format="default"/>, <xref target="OP_CB_PUSH_DELEG" format="default"/>, <xref target="OP_CB_RECALLABLE_OBJ_AVAIL" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| Notifications of changes to files and directories | ||||
| (<xref target="OP_GET_DIR_DELEGATION" format="default"/>, <xref target="OP_CB_NOTIFY" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| A method to allow a server to indicate that it is | ||||
| recalling one or more delegations for resource | ||||
| management reasons, and thus a method to allow | ||||
| the client to pick which delegations to return | ||||
| (<xref target="OP_CB_RECALL_ANY" format="default"/>). | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| Attributes can be set atomically | ||||
| during exclusive file create via the OPEN operation | ||||
| (see the new EXCLUSIVE4_1 creation method in | ||||
| <xref target="OP_OPEN" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| Open files can be preserved if removed and the | ||||
| hard link count ("hard link" is defined in | ||||
| an <xref target="hardlink" format="default">Open Group</xref> standard) goes | ||||
| to zero, thus obviating the | ||||
| need for clients to rename deleted files to | ||||
| partially hidden names -- colloquially called | ||||
| "silly rename" (see the new | ||||
| OPEN4_RESULT_PRESERVE_UNLINKED reply flag in | ||||
| <xref target="OP_OPEN" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| Improved compatibility with Microsoft Windows for | ||||
| Access Control Lists (<xref target="attrdef_sacl" format="default"/>, <xref target="attrdef_dacl" format="default"/>, <xref target="auto_inherit" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| Data retention (<xref target="retention" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| Identification of the implementation of the NFS client | ||||
| and server (<xref target="OP_EXCHANGE_ID" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| Support for notification of the availability of | ||||
| byte-range locks (see the new | ||||
| OPEN4_RESULT_MAY_NOTIFY_LOCK reply flag in <xref target="OP_OPEN" format="default"/> and see <xref target="OP_CB_NOTIFY_LOCK" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| In NFSv4.1, LIPKEY and SPKM-3 are not required security mechanisms | ||||
| <xref target="RFC2847" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="Core_Infrastructure" numbered="true" toc="default"> | ||||
| <name>Core Infrastructure</name> | ||||
| <section anchor="Introduction" numbered="true" toc="default"> | ||||
| <name>Introduction</name> | ||||
| <t> | ||||
| NFSv4.1 relies on core infrastructure common to nearly | ||||
| every operation. This core infrastructure is described in the remainder | ||||
| of this section. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Introduction --> | ||||
| <section anchor="RPC_and_XDR" numbered="true" toc="default"> | ||||
| <name>RPC and XDR</name> | ||||
| <t> | ||||
| The NFSv4.1 protocol is a Remote Procedure Call (RPC) | ||||
| application that uses RPC version 2 and the corresponding eXternal | ||||
| Data Representation (XDR) as defined in | ||||
| <xref target="RFC5531" format="default"/> and | ||||
| <xref target="RFC4506" format="default"/>. | ||||
| </t> | ||||
| <section anchor="RPC-based_Security" numbered="true" toc="default"> | ||||
| <name>RPC-Based Security</name> | ||||
| <t> | ||||
| Previous NFS versions have been thought of as having a | ||||
| host-based authentication model, where the NFS server | ||||
| authenticates the NFS client, and trusts the client | ||||
| to authenticate all users. | ||||
| Actually, NFS has always depended on RPC for | ||||
| authentication. One of the first forms of RPC authentication, | ||||
| AUTH_SYS, had no strong authentication and | ||||
| required a host-based authentication | ||||
| approach. NFSv4.1 also depends on RPC for basic security | ||||
| services and mandates RPC support for a user-based | ||||
| authentication model. The user-based authentication | ||||
| model has user principals authenticated by a server, and | ||||
| in turn the server authenticated by user principals. | ||||
| RPC provides some basic security services that are used | ||||
| by NFSv4.1. | ||||
| </t> | ||||
| <section anchor="RPC_Security_Flavors" numbered="true" toc="default"> | ||||
| <name>RPC Security Flavors</name> | ||||
| <t> | ||||
| As described in "Authentication", <xref target="RFC5531" sectionFormat="of" section="7"/>, | ||||
| RPC security is encapsulated in the RPC header, via a | ||||
| security or authentication flavor, and information | ||||
| specific to the specified security flavor. | ||||
| Every RPC header conveys information used to identify | ||||
| and authenticate a client and server. As discussed in | ||||
| <xref target="RPCSEC_GSS_and_Security_Services" format="default"/>, | ||||
| some security flavors provide additional security | ||||
| services. | ||||
| </t> | ||||
| <t> | ||||
| NFSv4.1 clients and servers <bcp14>MUST</bcp14> implement RPCSEC_GSS. | ||||
| (This requirement to implement is not a requirement to | ||||
| use.) Other flavors, such as AUTH_NONE and | ||||
| AUTH_SYS, <bcp14>MAY</bcp14> be implemented as well. | ||||
| </t> | ||||
| <section anchor="RPCSEC_GSS_and_Security_Services" numbered="true" toc="default"> | ||||
| <name>RPCSEC_GSS and Security Services</name> | ||||
| <t> | ||||
| RPCSEC_GSS <xref target="RFC2203" format="default"/> uses the | ||||
| functionality of GSS-API <xref target="RFC2743" format="default"/>. This allows for the | ||||
| use of various security mechanisms by the RPC layer | ||||
| without the additional implementation overhead of | ||||
| adding RPC security flavors. | ||||
| </t> | ||||
| <section anchor="Authentication_Integrity_Privacy" numbered="true" toc="default"> | ||||
| <name>Identification, Authentication, Integrity, Privacy</name> | ||||
| <t> | ||||
| Via the GSS-API, RPCSEC_GSS can be used to identify and authenticate | ||||
| users on clients to servers, and servers to users. It can also | ||||
| perform integrity checking on the entire RPC message, including | ||||
| the RPC header, and on the arguments or results. Finally, privacy, | ||||
| usually via encryption, is a service available with RPCSEC_GSS. | ||||
| Privacy is performed on the arguments and results. Note that | ||||
| if privacy is selected, integrity, authentication, and identification | ||||
| are enabled. | ||||
| If privacy is not selected, but integrity is selected, authentication | ||||
| and identification are enabled. If integrity and privacy are not | ||||
| selected, but authentication is enabled, | ||||
| identification is enabled. RPCSEC_GSS does not provide identification as | ||||
| a separate service. | ||||
| </t> | ||||
| <t> | ||||
| Although GSS-API has an authentication service distinct from its | ||||
| privacy and integrity services, GSS-API's | ||||
| authentication service is not used for RPCSEC_GSS's authentication | ||||
| service. Instead, each RPC request and response header is | ||||
| integrity protected with the GSS-API integrity service, and | ||||
| this allows RPCSEC_GSS to offer per-RPC authentication and | ||||
| identity. See <xref target="RFC2203" format="default"/> for more information. | ||||
| </t> | ||||
| <t> | ||||
| NFSv4.1 client and servers <bcp14>MUST</bcp14> support RPCSEC_GSS's integrity and authentication | ||||
| service. NFSv4.1 servers <bcp14>MUST</bcp14> support RPCSEC_GSS's privacy service. | ||||
| NFSv4.1 clients <bcp14>SHOULD</bcp14> support RPCSEC_GSS's privacy service. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Identity, Authentication, Integrity, Privacy --> | ||||
| <section anchor="security_mechs" numbered="true" toc="default"> | ||||
| <name>Security Mechanisms for NFSv4.1</name> | ||||
| <t> | ||||
| RPCSEC_GSS, via GSS-API, normalizes access to mechanisms that | ||||
| provide security services. Therefore, NFSv4.1 clients and servers | ||||
| <bcp14>MUST</bcp14> support the Kerberos V5 security mechanism. | ||||
| </t> | ||||
| <t> | ||||
| The use of RPCSEC_GSS requires selection of mechanism, | ||||
| quality of protection (QOP), and service (authentication, | ||||
| integrity, privacy). For the mandated security mechanisms, | ||||
| NFSv4.1 specifies that a QOP of zero is used, leaving it up | ||||
| to the mechanism or the mechanism's configuration to map | ||||
| QOP zero to | ||||
| an appropriate level of protection. | ||||
| Each mandated mechanism specifies a minimum set of cryptographic | ||||
| algorithms for implementing integrity and privacy. NFSv4.1 | ||||
| clients and servers <bcp14>MUST</bcp14> be implemented on operating environments | ||||
| that comply with the <bcp14>REQUIRED</bcp14> cryptographic algorithms | ||||
| of each <bcp14>REQUIRED</bcp14> mechanism. | ||||
| </t> | ||||
| <section anchor="kerberosv5" numbered="true" toc="default"> | ||||
| <name>Kerberos V5</name> | ||||
| <t> | ||||
| The Kerberos V5 GSS-API mechanism as described in | ||||
| <xref target="RFC4121" format="default"/> <bcp14>MUST</bcp14> be implemented with | ||||
| the RPCSEC_GSS services as specified in the following | ||||
| table: | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| column descriptions: | ||||
| 1 == number of pseudo flavor | ||||
| 2 == name of pseudo flavor | ||||
| 3 == mechanism's OID | ||||
| 4 == RPCSEC_GSS service | ||||
| 5 == NFSv4.1 clients MUST support | ||||
| 6 == NFSv4.1 servers MUST support | ||||
| 1 2 3 4 5 6 | ||||
| ------------------------------------------------------------------ | ||||
| 390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes | ||||
| 390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes | ||||
| 390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes | ||||
| ]]></artwork> | ||||
| <t> | ||||
| Note that the number and name of the pseudo flavor | ||||
| are presented here as a mapping aid to the implementor. | ||||
| Because the NFSv4.1 protocol includes a method to negotiate | ||||
| security and it understands the GSS-API mechanism, the pseudo flavor | ||||
| is not needed. The pseudo flavor is needed for the NFSv3 since the security negotiation is done via | ||||
| the MOUNT protocol as described in <xref target="RFC2623" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| At the time NFSv4.1 was specified, the Advanced Encryption | ||||
| Standard (AES) with HMAC-SHA1 was | ||||
| a <bcp14>REQUIRED</bcp14> algorithm set for Kerberos V5. In contrast, when | ||||
| NFSv4.0 was specified, weaker algorithm sets were <bcp14>REQUIRED</bcp14> for | ||||
| Kerberos V5, and were <bcp14>REQUIRED</bcp14> in the NFSv4.0 specification, because | ||||
| the Kerberos V5 specification at the time did not specify stronger | ||||
| algorithms. | ||||
| The NFSv4.1 specification does not specify <bcp14>REQUIRED</bcp14> algorithms | ||||
| for Kerberos V5, and instead, the implementor is expected | ||||
| to track the evolution of the Kerberos V5 standard if and when | ||||
| stronger algorithms are specified. | ||||
| </t> | ||||
| <section anchor="krb5_sec_consider" numbered="true" toc="default"> | ||||
| <name>Security Considerations for Cryptographic Algorithms in Kerberos V5</name> | ||||
| <t> | ||||
| When deploying NFSv4.1, the strength of the security achieved depends | ||||
| on the existing Kerberos V5 infrastructure. The algorithms | ||||
| of Kerberos V5 are not directly exposed to or selectable by the | ||||
| client or server, so there is some due diligence required by | ||||
| the user of NFSv4.1 to ensure that security is acceptable | ||||
| where needed. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] Kerberos V5 --> | ||||
| </section> | ||||
| <!-- [auth] Security mechanisms for NFSv4.1 --> | ||||
| <section anchor="GSS_Server_Principal" numbered="true" toc="default"> | ||||
| <name>GSS Server Principal</name> | ||||
| <t> | ||||
| Regardless of what security mechanism under RPCSEC_GSS | ||||
| is being used, the NFS server <bcp14>MUST</bcp14> identify itself | ||||
| in GSS-API via a GSS_C_NT_HOSTBASED_SERVICE name type. | ||||
| GSS_C_NT_HOSTBASED_SERVICE names are of the form: | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| service@hostname | ||||
| ]]></artwork> | ||||
| <t> | ||||
| For NFS, the "service" element is | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| nfs | ||||
| ]]></artwork> | ||||
| <t> | ||||
| Implementations of security mechanisms will convert | ||||
| nfs@hostname to various different forms. For Kerberos | ||||
| V5, the following form is <bcp14>RECOMMENDED</bcp14>: | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| nfs/hostname | ||||
| ]]></artwork> | ||||
| </section> | ||||
| <!-- [auth] GSS Server Principal --> | ||||
| </section> | ||||
| <!-- [auth] RPCSEC_GSS and Security Services --> | ||||
| </section> | ||||
| <!-- [auth] RPC Security Flavors --> | ||||
| </section> | ||||
| <!-- [auth] RPC-based Security --> | ||||
| </section> | ||||
| <!-- [auth] RPC and XDR --> | ||||
| <section anchor="COMPOUND_and_CB_COMPOUND" numbered="true" toc="default"> | ||||
| <name>COMPOUND and CB_COMPOUND</name> | ||||
| <t> | ||||
| A significant departure from the versions of the NFS | ||||
| protocol before NFSv4 is the introduction of the | ||||
| COMPOUND procedure. For the NFSv4 protocol, | ||||
| in all minor versions, there are exactly two RPC procedures, | ||||
| NULL and COMPOUND. The COMPOUND procedure is defined | ||||
| as a series of individual operations and these operations | ||||
| perform the sorts of functions performed by traditional | ||||
| NFS procedures. | ||||
| </t> | ||||
| <t> | ||||
| The operations combined within a COMPOUND | ||||
| request are evaluated in order by the server, without | ||||
| any atomicity guarantees. A limited set of facilities | ||||
| exist to pass results from one operation to another. Once an | ||||
| operation returns a failing result, the evaluation ends | ||||
| and the results of all | ||||
| evaluated operations are returned to the client. | ||||
| </t> | ||||
| <t> | ||||
| With the use of the COMPOUND procedure, the client is able to build | ||||
| simple or complex requests. These COMPOUND requests allow for a | ||||
| reduction in the number of RPCs needed for logical file system | ||||
| operations. For example, multi-component look up requests can | ||||
| be constructed by combining multiple LOOKUP operations. Those | ||||
| can be further combined with operations such as GETATTR, READDIR, | ||||
| or OPEN plus READ to do more complicated sets of operation without | ||||
| incurring additional latency. | ||||
| </t> | ||||
| <t> | ||||
| NFSv4.1 also contains a considerable set of | ||||
| callback operations in which the server makes an RPC | ||||
| directed at the client. Callback RPCs have a similar | ||||
| structure to that of the normal server requests. | ||||
| In all minor versions of the NFSv4 protocol, | ||||
| there are two callback RPC procedures: | ||||
| CB_NULL and CB_COMPOUND. The CB_COMPOUND procedure is defined | ||||
| in an analogous fashion to that of COMPOUND | ||||
| with its own set of callback operations. | ||||
| </t> | ||||
| <t> | ||||
| The addition of new server and callback operations within the | ||||
| COMPOUND and CB_COMPOUND request | ||||
| framework provides a means of extending the protocol in | ||||
| subsequent minor versions. | ||||
| </t> | ||||
| <t> | ||||
| Except for a small number of operations needed for session | ||||
| creation, server requests and callback requests are performed | ||||
| within the context of a session. Sessions provide a client | ||||
| context for every request and support robust replay | ||||
| protection for non-idempotent requests. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] COMPOUND and CB_COMPOUND --> | ||||
| <section anchor="Client_Identifiers" numbered="true" toc="default"> | ||||
| <name>Client Identifiers and Client Owners</name> | ||||
| <t> | ||||
| For each operation that obtains or depends on locking state, the | ||||
| specific client needs to be identifiable by the server. | ||||
| </t> | ||||
| <t> | ||||
| Each distinct client instance is represented | ||||
| by a client ID. A client ID is a 64-bit identifier | ||||
| representing a specific client at a given time. | ||||
| The client ID is changed whenever the client re-initializes, | ||||
| and may change when the server re-initializes. | ||||
| Client IDs are used to support lock identification | ||||
| and crash recovery. | ||||
| </t> | ||||
| <t> | ||||
| During steady state operation, | ||||
| the client ID associated with each operation | ||||
| is derived from the session (see <xref target="Session" format="default"/>) on which the operation is sent. A session is associated with | ||||
| a client ID when the session is created. | ||||
| </t> | ||||
| <t> | ||||
| Unlike NFSv4.0, the only NFSv4.1 operations possible before a | ||||
| client ID is established are those needed to | ||||
| establish the client ID. | ||||
| </t> | ||||
| <t> | ||||
| A sequence of an EXCHANGE_ID operation followed by a | ||||
| CREATE_SESSION operation using that client ID | ||||
| (eir_clientid as returned from EXCHANGE_ID) | ||||
| is required to establish and confirm the | ||||
| client ID on the server. Establishment of identification by a | ||||
| new incarnation of the client also has the effect of immediately | ||||
| releasing any locking state that a previous incarnation of that | ||||
| same client might have had on the server. Such released state | ||||
| would include all byte-range lock, share reservation, layout state, and -- where the server supports neither the CLAIM_DELEGATE_PREV nor CLAIM_DELEG_CUR_FH claim types -- all delegation state associated with the same client with the same | ||||
| identity. For discussion of delegation state recovery, see | ||||
| <xref target="delegation_recovery" format="default"/>. For discussion of layout state | ||||
| recovery, see <xref target="pnfs_client_recovery" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| Releasing such state requires that the server be able to determine | ||||
| that one client instance is the successor of another. Where this | ||||
| cannot be done, for any of a number of reasons, the locking state | ||||
| will remain for a time subject to lease expiration | ||||
| (see <xref target="lease_renewal" format="default"/>) | ||||
| and the new client will need to wait for | ||||
| such state to be removed, if it makes conflicting lock requests. | ||||
| </t> | ||||
| <t> | ||||
| Client identification is encapsulated in the following client owner | ||||
| data type: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct client_owner4 { | ||||
| verifier4 co_verifier; | ||||
| opaque co_ownerid<NFS4_OPAQUE_LIMIT>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The first field, co_verifier, is a client incarnation | ||||
| verifier, allowing the server to distinguish successive incarnations | ||||
| (e.g., reboots) of the same client. The server will start the process of | ||||
| canceling the client's leased state if co_verifier | ||||
| is different than what the server has previously | ||||
| recorded for the identified client (as specified in | ||||
| the co_ownerid field). | ||||
| </t> | ||||
| <t> | ||||
| The second field, co_ownerid, is a variable length string that uniquely defines | ||||
| the client so that subsequent instances of the same client bear the | ||||
| same co_ownerid with a different verifier. | ||||
| </t> | ||||
| <t> | ||||
| There are several considerations for how the client | ||||
| generates the co_ownerid string: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The string should be unique so that multiple clients | ||||
| do not present the same string. The consequences of | ||||
| two clients presenting the same string range from | ||||
| one client getting an error to one client having its | ||||
| leased state abruptly and unexpectedly cancelled. | ||||
| </li> | ||||
| <li> | ||||
| The string should be selected so that subsequent incarnations | ||||
| (e.g., restarts) of the same client cause the client to present | ||||
| the same string. The implementor | ||||
| is cautioned from an approach that requires the string to | ||||
| be recorded in a local file because this precludes the use | ||||
| of the implementation in an environment where there is no local | ||||
| disk and all file access is from an NFSv4.1 server. | ||||
| </li> | ||||
| <li> | ||||
| The string should be the same for each server network address that | ||||
| the client accesses. | ||||
| This way, if a server has multiple interfaces, the client | ||||
| can trunk traffic over multiple network paths | ||||
| as described in <xref target="Trunking" format="default"/>. | ||||
| (Note: the precise opposite was advised in the NFSv4.0 | ||||
| specification <xref target="RFC3530" format="default"/>.) | ||||
| </li> | ||||
| <li> | ||||
| The algorithm for generating the string should not | ||||
| assume that the client's network address will not | ||||
| change, unless the client implementation knows it | ||||
| is using statically assigned network addresses. | ||||
| This includes changes between client incarnations | ||||
| and even changes while the client is still running | ||||
| in its current incarnation. Thus, with dynamic | ||||
| address assignment, if the | ||||
| client includes just the client's network address | ||||
| in the co_ownerid string, there is a real risk | ||||
| that after the | ||||
| client gives up the network address, another | ||||
| client, using a similar algorithm for generating | ||||
| the co_ownerid string, would generate a conflicting | ||||
| co_ownerid string. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Given the above considerations, an example of a well-generated co_ownerid | ||||
| string is one that includes: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If applicable, the client's statically assigned network address. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Additional information that tends to be unique, such as one or more | ||||
| of: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The client machine's serial number (for privacy reasons, it is best | ||||
| to perform some one-way function on the serial number). | ||||
| </li> | ||||
| <li> | ||||
| A Media Access Control (MAC) address (again, a one-way function should be performed). | ||||
| </li> | ||||
| <li> | ||||
| The timestamp of when the NFSv4.1 software was first installed | ||||
| on the client (though this is subject to the previously mentioned | ||||
| caution about using information that is stored in a file, because the | ||||
| file might only be accessible over NFSv4.1). | ||||
| </li> | ||||
| <li> | ||||
| A true random number. However, since this number ought to be the same | ||||
| between client incarnations, this shares the same problem as that of | ||||
| using the timestamp of the software installation. | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| For a user-level NFSv4.1 client, it should contain additional | ||||
| information to distinguish the client from other user-level clients | ||||
| running on the same host, such as a process identifier or other unique | ||||
| sequence. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The client ID is assigned by the server (the eir_clientid result from EXCHANGE_ID) | ||||
| and should be chosen so that it will not | ||||
| conflict with a client ID previously assigned by the | ||||
| server. This applies across server restarts. | ||||
| </t> | ||||
| <t> | ||||
| In the event of a server restart, a client may find | ||||
| out that its current client ID is no longer valid when | ||||
| it receives an NFS4ERR_STALE_CLIENTID error. The precise | ||||
| circumstances depend on the characteristics of the | ||||
| sessions involved, specifically whether the session is | ||||
| persistent (see <xref target="Persistence" format="default"/>), but in | ||||
| each case the client will receive this error when it attempts | ||||
| to establish a new session with the existing client ID and | ||||
| receives the error NFS4ERR_STALE_CLIENTID, indicating that a new | ||||
| client ID needs to be obtained via EXCHANGE_ID and the new session | ||||
| established with that client ID. | ||||
| </t> | ||||
| <t> | ||||
| When a session is not persistent, the client will find out that | ||||
| it needs to create a new session as a result of getting an | ||||
| NFS4ERR_BADSESSION, since the session in question was lost | ||||
| as part of a server restart. When the existing client ID is | ||||
| presented to a server as part of creating a session | ||||
| and that client ID is not recognized, as would happen after a server | ||||
| restart, the server will reject the request with the error | ||||
| NFS4ERR_STALE_CLIENTID. | ||||
| </t> | ||||
| <t> | ||||
| In the case of the session being persistent, the | ||||
| client will re-establish communication using the | ||||
| existing session after the restart. This session | ||||
| will be associated with the existing client ID but | ||||
| may only be used to retransmit operations that the | ||||
| client previously transmitted and did not see replies | ||||
| to. Replies to operations that the server previously performed | ||||
| will come from the reply cache; otherwise, | ||||
| NFS4ERR_DEADSESSION will be returned. | ||||
| Hence, such a session is referred to as "dead". In this situation, | ||||
| in order to perform new operations, the client needs to | ||||
| establish a new session. If an attempt is made to | ||||
| establish this new session with the existing client ID, | ||||
| the server will reject the request with | ||||
| NFS4ERR_STALE_CLIENTID. | ||||
| </t> | ||||
| <t> | ||||
| When NFS4ERR_STALE_CLIENTID is received in either of | ||||
| these situations, the client needs to obtain a | ||||
| new client ID by use of the EXCHANGE_ID operation, then | ||||
| use that client ID as the basis of a new session, and | ||||
| then proceed to | ||||
| any other necessary recovery for the server restart case (see | ||||
| <xref target="server_failure" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| See the descriptions of EXCHANGE_ID | ||||
| (<xref target="OP_EXCHANGE_ID" format="default"/>) and CREATE_SESSION | ||||
| (<xref target="OP_CREATE_SESSION" format="default"/>) for a complete | ||||
| specification of these operations. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Upgrade from NFSv4.0 to NFSv4.1</name> | ||||
| <t> | ||||
| To facilitate upgrade from NFSv4.0 to NFSv4.1, a server | ||||
| may compare a value of data type client_owner4 in an EXCHANGE_ID with a | ||||
| value of data type nfs_client_id4 that was established using the SETCLIENTID operation of | ||||
| NFSv4.0. A server that does so will allow | ||||
| an upgraded client to avoid waiting | ||||
| until the lease (i.e., the lease established by the NFSv4.0 instance | ||||
| client) expires. | ||||
| This requires that the value of data type client_owner4 be constructed | ||||
| the same way as the value of data type nfs_client_id4. If the latter's | ||||
| contents included the server's network address (per the | ||||
| recommendations of the NFSv4.0 specification <xref target="RFC3530" format="default"/>), and | ||||
| the NFSv4.1 client does not wish to use a client | ||||
| ID that prevents trunking, it should send two | ||||
| EXCHANGE_ID operations. The first EXCHANGE_ID will | ||||
| have a client_owner4 equal to the nfs_client_id4. | ||||
| This will clear the state created by the NFSv4.0 | ||||
| client. The second EXCHANGE_ID will not have the | ||||
| server's network address. The state created for the | ||||
| second EXCHANGE_ID will not have to wait for lease | ||||
| expiration, because there will be no state to expire. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Server Release of Client ID</name> | ||||
| <t> | ||||
| NFSv4.1 introduces a new operation called | ||||
| DESTROY_CLIENTID (<xref target="OP_DESTROY_CLIENTID" format="default"/>), | ||||
| which the client <bcp14>SHOULD</bcp14> use to destroy a client ID it | ||||
| no longer needs. This permits graceful, bilateral release of | ||||
| a client ID. The operation cannot be used if there are sessions | ||||
| associated with the client ID, or state with an unexpired lease. | ||||
| </t> | ||||
| <t> | ||||
| If the server determines that the client holds no associated state | ||||
| for its client ID (associated state includes unrevoked sessions, | ||||
| opens, locks, delegations, layouts, and wants), the server <bcp14>MAY</bcp14> | ||||
| choose to unilaterally release the client ID in order to | ||||
| conserve resources. | ||||
| If the client | ||||
| contacts the server after this release, the server | ||||
| <bcp14>MUST</bcp14> ensure that the client receives the appropriate error | ||||
| so that it will use the EXCHANGE_ID/CREATE_SESSION | ||||
| sequence to establish a new client ID. | ||||
| The server ought to be very hesitant to | ||||
| release a client ID since the resulting work on the | ||||
| client to recover from such an event will be the same | ||||
| burden as if the server had failed and restarted. | ||||
| Typically, a server would not release a client ID | ||||
| unless there had been no activity from that client | ||||
| for many minutes. As long as there are sessions, | ||||
| opens, locks, delegations, layouts, or wants, the | ||||
| server <bcp14>MUST NOT</bcp14> release the client ID. See <xref target="loss_of_session" format="default"/> for discussion on | ||||
| releasing inactive sessions. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Server Release of Client ID --> | ||||
| <section anchor="cowner_conflicts" numbered="true" toc="default"> | ||||
| <name>Resolving Client Owner Conflicts</name> | ||||
| <t> | ||||
| When the server gets an EXCHANGE_ID for a client owner that | ||||
| currently has no state, or that has state but the lease has expired, | ||||
| the server <bcp14>MUST</bcp14> allow the | ||||
| EXCHANGE_ID and confirm the new client ID if followed by the | ||||
| appropriate CREATE_SESSION. | ||||
| </t> | ||||
| <t> | ||||
| When the server gets an EXCHANGE_ID for a | ||||
| new incarnation of a client owner that | ||||
| currently has an old incarnation with state and an unexpired lease, the | ||||
| server is allowed to dispose of the state of the | ||||
| previous incarnation of the client owner if | ||||
| one of the following is true: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The principal that created the client ID for the client owner | ||||
| is the same as the principal that is sending the EXCHANGE_ID operation. | ||||
| Note that if the client ID was created with | ||||
| SP4_MACH_CRED state protection (<xref target="OP_EXCHANGE_ID" format="default"/>), | ||||
| the principal <bcp14>MUST</bcp14> be based on RPCSEC_GSS authentication, | ||||
| the RPCSEC_GSS service used <bcp14>MUST</bcp14> be integrity or | ||||
| privacy, and the | ||||
| same GSS mechanism and principal | ||||
| <bcp14>MUST</bcp14> be used as that used when the client ID | ||||
| was created. | ||||
| </li> | ||||
| <li> | ||||
| The client ID was established with SP4_SSV | ||||
| protection (<xref target="OP_EXCHANGE_ID" format="default"/>, | ||||
| <xref target="protect_state_change" format="default"/>) | ||||
| and the client sends the EXCHANGE_ID with the | ||||
| security flavor set to RPCSEC_GSS using the GSS | ||||
| SSV mechanism (<xref target="ssv_mech" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| The client ID was established with SP4_SSV | ||||
| protection, and under the conditions described herein, | ||||
| the EXCHANGE_ID was sent with SP4_MACH_CRED state protection. | ||||
| Because the SSV might not persist | ||||
| across client and server restart, and because | ||||
| the first time a client sends EXCHANGE_ID to | ||||
| a server it does not have an SSV, the client | ||||
| <bcp14>MAY</bcp14> send the subsequent EXCHANGE_ID without | ||||
| an SSV RPCSEC_GSS handle. Instead, as with | ||||
| SP4_MACH_CRED protection, the principal <bcp14>MUST</bcp14> be | ||||
| based on RPCSEC_GSS authentication, the RPCSEC_GSS | ||||
| service used <bcp14>MUST</bcp14> be integrity or privacy, and the | ||||
| same GSS mechanism and principal <bcp14>MUST</bcp14> be used as | ||||
| that used when the client ID was created. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If none of the above situations apply, the server | ||||
| <bcp14>MUST</bcp14> return NFS4ERR_CLID_INUSE. | ||||
| </t> | ||||
| <t> | ||||
| If the server accepts the principal and co_ownerid | ||||
| as matching that which created the client ID, and | ||||
| the co_verifier in the EXCHANGE_ID differs from the | ||||
| co_verifier used when the client ID was created, | ||||
| then after the server receives a CREATE_SESSION that | ||||
| confirms the client ID, the server deletes state. | ||||
| If the co_verifier values are the same (e.g., the | ||||
| client either is updating properties of the client ID | ||||
| (<xref target="OP_EXCHANGE_ID" format="default"/>) or | ||||
| is attempting trunking (<xref target="Trunking" format="default"/>), | ||||
| the server <bcp14>MUST NOT</bcp14> delete state. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Handling Client Owner Conflicts --> | ||||
| </section> | ||||
| <!-- [auth] Client Identifiers --> | ||||
| <section anchor="Server_Owners" numbered="true" toc="default"> | ||||
| <name>Server Owners</name> | ||||
| <t> | ||||
| The server owner is similar to a client owner | ||||
| (<xref target="Client_Identifiers" format="default"/>), but unlike the | ||||
| client owner, there is no shorthand server ID. | ||||
| The server owner is defined in the following data type: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct server_owner4 { | ||||
| uint64_t so_minor_id; | ||||
| opaque so_major_id<NFS4_OPAQUE_LIMIT>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The server owner is returned from | ||||
| EXCHANGE_ID. When the so_major_id fields are the same in | ||||
| two EXCHANGE_ID results, the connections that each EXCHANGE_ID | ||||
| were sent over can be assumed to address the same server | ||||
| (as defined in <xref target="intro_definitions" format="default"/>). If | ||||
| the so_minor_id fields are also the same, then not only | ||||
| do both connections connect to the same server, but the | ||||
| session can be shared across both | ||||
| connections. The reader is cautioned that multiple | ||||
| servers may deliberately or accidentally claim to have | ||||
| the same so_major_id or so_major_id/so_minor_id; the | ||||
| reader should examine Sections <xref target="Trunking" format="counter"/> and | ||||
| <xref target="OP_EXCHANGE_ID" format="counter"/> in order to avoid | ||||
| acting on falsely matching server owner values. | ||||
| </t> | ||||
| <t> | ||||
| The considerations for generating an so_major_id are | ||||
| similar to that for generating a co_ownerid string (see | ||||
| <xref target="Client_Identifiers" format="default"/>). The consequences | ||||
| of two servers generating conflicting so_major_id values | ||||
| are less dire than they are for co_ownerid conflicts | ||||
| because the client can use RPCSEC_GSS to compare the | ||||
| authenticity of each server | ||||
| (see <xref target="Trunking" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Server Owners --> | ||||
| <section anchor="Security_Service_Negotiation" numbered="true" toc="default"> | ||||
| <name>Security Service Negotiation</name> | ||||
| <t> | ||||
| With the NFSv4.1 server potentially offering | ||||
| multiple security mechanisms, the client needs a method | ||||
| to determine or negotiate which mechanism is to be | ||||
| used for its communication with the server. The NFS | ||||
| server may have multiple points within its file system | ||||
| namespace that are available for use by NFS clients. | ||||
| These points can be considered security policy boundaries, | ||||
| and, in some NFS implementations, are tied to NFS export points. | ||||
| In turn, the NFS server may be configured such that each | ||||
| of these security policy boundaries may have different or multiple | ||||
| security mechanisms in use. | ||||
| </t> | ||||
| <t> | ||||
| The security negotiation between client and server | ||||
| <bcp14>SHOULD</bcp14> be done with a secure channel to eliminate | ||||
| the possibility of a third party intercepting the | ||||
| negotiation sequence and forcing the client and server | ||||
| to choose a lower level of security than required or | ||||
| desired. See | ||||
| <xref target="SECCON" format="default"/> for further discussion. | ||||
| </t> | ||||
| <section anchor="NFSv4_Security_Tuples" numbered="true" toc="default"> | ||||
| <name>NFSv4.1 Security Tuples</name> | ||||
| <t> | ||||
| An NFS server can assign one or more "security tuples" to each | ||||
| security policy boundary in its namespace. Each security tuple | ||||
| consists of a security flavor | ||||
| (see <xref target="RPC_Security_Flavors" format="default"/>) and, if the flavor | ||||
| is RPCSEC_GSS, a GSS-API mechanism Object Identifier (OID), a GSS-API quality of | ||||
| protection, and an RPCSEC_GSS service. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] NFSv4.1 Security Tuples --> | ||||
| <section anchor="SECINFO_and_SECINFO_NO_NAME" numbered="true" toc="default"> | ||||
| <name>SECINFO and SECINFO_NO_NAME</name> | ||||
| <t> | ||||
| The SECINFO and SECINFO_NO_NAME operations allow the client to | ||||
| determine, on a per-filehandle basis, what security tuple is to be | ||||
| used for server access. In general, the client will not have to | ||||
| use either operation except during initial communication with the | ||||
| server or when the client crosses security policy boundaries at the | ||||
| server. However, the server's policies may also change at any time | ||||
| and force the client to negotiate a new security tuple. | ||||
| </t> | ||||
| <t> | ||||
| Where the use of different security tuples would affect the type of | ||||
| access that would be allowed if a request was sent over the same | ||||
| connection used for the SECINFO or SECINFO_NO_NAME operation | ||||
| (e.g., read-only vs. read-write) access, security tuples that allow | ||||
| greater access should be presented first. Where the general level | ||||
| of access is the same and different security flavors limit the | ||||
| range of principals whose privileges are recognized (e.g., allowing | ||||
| or disallowing root access), flavors supporting the greatest range | ||||
| of principals should be listed first. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] SECINFO and SECINFO_NO_NAME --> | ||||
| <section anchor="Security_Error" numbered="true" toc="default"> | ||||
| <name>Security Error</name> | ||||
| <t> | ||||
| Based on the assumption that each NFSv4.1 client | ||||
| and server <bcp14>MUST</bcp14> support a minimum set of security (i.e., | ||||
| Kerberos V5 under RPCSEC_GSS), | ||||
| the NFS client will initiate file access to the server | ||||
| with one of the minimal security tuples. During | ||||
| communication with the server, the client may receive an | ||||
| NFS error of NFS4ERR_WRONGSEC. This error allows the | ||||
| server to notify the client that the security tuple | ||||
| currently being used contravenes the server's | ||||
| security policy. The client is then responsible for | ||||
| determining (see <xref target="using_secinfo" format="default"/>) what | ||||
| security tuples are available at the server and choosing | ||||
| one that is appropriate for the client. | ||||
| </t> | ||||
| <section anchor="using_secinfo" numbered="true" toc="default"> | ||||
| <name>Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME</name> | ||||
| <t> | ||||
| This section explains the mechanics of NFSv4.1 security negotiation. | ||||
| </t> | ||||
| <section anchor="putfh_series" numbered="true" toc="default"> | ||||
| <name>Put Filehandle Operations</name> | ||||
| <t> | ||||
| The term "put filehandle operation" refers to | ||||
| PUTROOTFH, PUTPUBFH, PUTFH, and RESTOREFH. Each of the subsections | ||||
| herein describes how the server handles a subseries of operations | ||||
| that starts with a put filehandle operation. | ||||
| </t> | ||||
| <section anchor="PUTFHplusSAVEFH" numbered="true" toc="default"> | ||||
| <name>Put Filehandle Operation + SAVEFH</name> | ||||
| <t> | ||||
| The client is saving a filehandle for a future | ||||
| RESTOREFH, LINK, or RENAME. SAVEFH <bcp14>MUST NOT</bcp14> | ||||
| return NFS4ERR_WRONGSEC. To determine whether or not the put | ||||
| filehandle operation returns NFS4ERR_WRONGSEC, | ||||
| the server implementation pretends SAVEFH is not in | ||||
| the series of operations and examines which of the | ||||
| situations described in the other subsections of <xref target="putfh_series" format="default"/> apply. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Put Filehandle Operation + SAVEFH --> | ||||
| <section anchor="PUTFHplusPUTFH" numbered="true" toc="default"> | ||||
| <name>Two or More Put Filehandle Operations</name> | ||||
| <t> | ||||
| For a series of N put filehandle operations, the server | ||||
| <bcp14>MUST NOT</bcp14> return NFS4ERR_WRONGSEC to the first N-1 put | ||||
| filehandle operations. | ||||
| The Nth put filehandle operation | ||||
| is handled as if it is the first in a subseries of | ||||
| operations. | ||||
| For example, if the | ||||
| server received a COMPOUND request with this series of | ||||
| operations -- PUTFH, PUTROOTFH, LOOKUP -- then the | ||||
| PUTFH operation is ignored for NFS4ERR_WRONGSEC purposes, and the | ||||
| PUTROOTFH, LOOKUP subseries is processed as according | ||||
| to <xref target="PUTFHplusLOOKUP" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] PUTFH + PUTFH --> | ||||
| <section anchor="PUTFHplusLOOKUP" numbered="true" toc="default"> | ||||
| <name>Put Filehandle Operation + LOOKUP (or OPEN of an Existing Name)</name> | ||||
| <t> | ||||
| This situation also applies to a put filehandle operation followed | ||||
| by a LOOKUP or an OPEN operation that specifies an existing component name. | ||||
| </t> | ||||
| <t> | ||||
| In this situation, the client is potentially crossing | ||||
| a security policy boundary, and the set of security tuples | ||||
| the parent directory supports may differ from those of | ||||
| the child. | ||||
| The server implementation may decide whether to impose | ||||
| any restrictions on security policy administration. | ||||
| There are at least three approaches (sec_policy_child is | ||||
| the tuple set of the child export, sec_policy_parent is | ||||
| that of the parent). | ||||
| </t> | ||||
| <ol spacing="normal" type="(%c)"> | ||||
| <li> | ||||
| sec_policy_child <= sec_policy_parent (<= for subset). This | ||||
| means that the set of security tuples specified on the | ||||
| security policy of a child directory is always a subset | ||||
| of its parent directory. | ||||
| </li> | ||||
| <li> | ||||
| sec_policy_child ^ sec_policy_parent != {} (^ for intersection, {} | ||||
| for the empty set). This means that the set of security tuples specified | ||||
| on the security policy of a child directory always has a non-empty intersection | ||||
| with that of the parent. | ||||
| </li> | ||||
| <li> | ||||
| sec_policy_child ^ sec_policy_parent == {}. This means that the | ||||
| set of security tuples specified on the security policy of a child directory | ||||
| may not intersect with that of the parent. In other words, there | ||||
| are no restrictions on how the system administrator may | ||||
| set up these tuples. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| In order for a server to support approaches (b) | ||||
| (for the case when a client chooses a flavor that is | ||||
| not a member of sec_policy_parent) and (c), the put | ||||
| filehandle operation cannot return NFS4ERR_WRONGSEC | ||||
| when there is a security tuple mismatch. Instead, | ||||
| it should be returned from the LOOKUP (or OPEN by | ||||
| existing component name) that follows. | ||||
| </t> | ||||
| <t> | ||||
| Since the above guideline does not contradict approach | ||||
| (a), it should be followed in general. Even if approach | ||||
| (a) is implemented, it is possible for the security | ||||
| tuple used to be acceptable for the target of LOOKUP | ||||
| but not for the filehandles used in the put filehandle operation. The | ||||
| put filehandle operation | ||||
| could be a PUTROOTFH or PUTPUBFH, where the | ||||
| client cannot know the security tuples for the root | ||||
| or public filehandle. Or the security policy for the | ||||
| filehandle used by the put filehandle operation | ||||
| could have changed since the | ||||
| time the filehandle was obtained. | ||||
| </t> | ||||
| <t> | ||||
| Therefore, an NFSv4.1 server <bcp14>MUST NOT</bcp14> return NFS4ERR_WRONGSEC | ||||
| in response to the put filehandle operation | ||||
| if the operation | ||||
| is immediately followed by a LOOKUP or an OPEN by component name. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] PUTFH + LOOKUP --> | ||||
| <section anchor="PUTFHplusLOOKUPP" numbered="true" toc="default"> | ||||
| <name>Put Filehandle Operation + LOOKUPP</name> | ||||
| <t> | ||||
| Since SECINFO only works its way down, there is no way LOOKUPP can | ||||
| return NFS4ERR_WRONGSEC without SECINFO_NO_NAME. SECINFO_NO_NAME | ||||
| solves this issue via style | ||||
| SECINFO_STYLE4_PARENT, which works in the opposite direction as SECINFO. | ||||
| As with <xref target="PUTFHplusLOOKUP" format="default"/>, a put filehandle operation | ||||
| that is followed by a LOOKUPP | ||||
| <bcp14>MUST NOT</bcp14> return NFS4ERR_WRONGSEC. | ||||
| If the server does not support SECINFO_NO_NAME, the client's | ||||
| only recourse is to send the put filehandle operation, | ||||
| LOOKUPP, GETFH sequence | ||||
| of operations with every security tuple it supports. | ||||
| </t> | ||||
| <t> | ||||
| Regardless of whether SECINFO_NO_NAME is supported, an | ||||
| NFSv4.1 server <bcp14>MUST NOT</bcp14> return NFS4ERR_WRONGSEC in | ||||
| response to a put filehandle operation if the | ||||
| operation is immediately followed by a LOOKUPP. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] PUTFH + LOOKUPP --> | ||||
| <section anchor="PUTFHplusSECINFO" numbered="true" toc="default"> | ||||
| <name>Put Filehandle Operation + SECINFO/SECINFO_NO_NAME</name> | ||||
| <t> | ||||
| A security-sensitive client is allowed to choose | ||||
| a strong security tuple when querying a server to | ||||
| determine a file object's permitted security tuples. | ||||
| The security tuple chosen by the client does not have | ||||
| to be included in the tuple list of the security policy | ||||
| of either the parent directory indicated in the put filehandle | ||||
| operation or the child file object indicated in SECINFO (or any parent directory | ||||
| indicated in SECINFO_NO_NAME). Of course, the server has to be | ||||
| configured for whatever security | ||||
| tuple the client selects; otherwise, the request will | ||||
| fail at the RPC layer with an appropriate authentication error. | ||||
| </t> | ||||
| <t> | ||||
| In theory, there is no connection between the security | ||||
| flavor used by SECINFO or SECINFO_NO_NAME and those | ||||
| supported by the security policy. But in practice, the | ||||
| client may start looking for strong flavors from those | ||||
| supported by the security policy, followed by those in | ||||
| the <bcp14>REQUIRED</bcp14> set. | ||||
| </t> | ||||
| <t> | ||||
| The NFSv4.1 server <bcp14>MUST NOT</bcp14> return NFS4ERR_WRONGSEC to a | ||||
| put filehandle operation that | ||||
| is immediately followed by SECINFO or SECINFO_NO_NAME. | ||||
| The NFSv4.1 server <bcp14>MUST NOT</bcp14> return NFS4ERR_WRONGSEC from SECINFO or | ||||
| SECINFO_NO_NAME. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] PUTFH + SECINFO --> | ||||
| <section anchor="PUTFHplusNothing" numbered="true" toc="default"> | ||||
| <name>Put Filehandle Operation + Nothing</name> | ||||
| <t> | ||||
| The NFSv4.1 server <bcp14>MUST NOT</bcp14> return NFS4ERR_WRONGSEC. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] PUTFH + Nothing --> | ||||
| <section anchor="PUTFHplusAnythingElse" numbered="true" toc="default"> | ||||
| <name>Put Filehandle Operation + Anything Else</name> | ||||
| <t> | ||||
| "Anything Else" includes OPEN by filehandle. | ||||
| </t> | ||||
| <t> | ||||
| The security policy enforcement applies to the | ||||
| filehandle specified in the put filehandle operation. Therefore, the | ||||
| put filehandle operation <bcp14>MUST</bcp14> | ||||
| return NFS4ERR_WRONGSEC when there is a security tuple | ||||
| mismatch. This avoids the complexity of | ||||
| adding NFS4ERR_WRONGSEC as an allowable error to every | ||||
| other operation. | ||||
| </t> | ||||
| <t> | ||||
| A COMPOUND containing the series put filehandle | ||||
| operation + SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an | ||||
| efficient way for the client to recover from | ||||
| NFS4ERR_WRONGSEC. | ||||
| </t> | ||||
| <t> | ||||
| The NFSv4.1 server <bcp14>MUST NOT</bcp14> return NFS4ERR_WRONGSEC to | ||||
| any operation other than a put filehandle operation, | ||||
| LOOKUP, LOOKUPP, and OPEN (by component name). | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] PUTFH + Anything Else --> | ||||
| <section anchor="aftersecinfo" numbered="true" toc="default"> | ||||
| <name>Operations after SECINFO and SECINFO_NO_NAME</name> | ||||
| <t> | ||||
| Suppose a client sends a COMPOUND procedure | ||||
| containing the series SEQUENCE, PUTFH, | ||||
| SECINFO_NONAME, READ, and suppose the security tuple | ||||
| used does not match that required for the target | ||||
| file. By rule (see <xref target="PUTFHplusSECINFO" format="default"/>), | ||||
| neither PUTFH nor SECINFO_NO_NAME can | ||||
| return NFS4ERR_WRONGSEC. By rule (see <xref target="PUTFHplusAnythingElse" format="default"/>), READ cannot return | ||||
| NFS4ERR_WRONGSEC. The issue is resolved by the fact | ||||
| that SECINFO and SECINFO_NO_NAME consume the current | ||||
| filehandle (note that this is a change from NFSv4.0). This leaves no current filehandle for | ||||
| READ to use, and READ returns NFS4ERR_NOFILEHANDLE. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Operations after SECINFO and SECINFO_NO_NAME" --> | ||||
| </section> | ||||
| <section anchor="link_rename" numbered="true" toc="default"> | ||||
| <name>LINK and RENAME</name> | ||||
| <t> | ||||
| The LINK and RENAME operations use both the current | ||||
| and saved filehandles. | ||||
| Technically, the server <bcp14>MAY</bcp14> return NFS4ERR_WRONGSEC from | ||||
| LINK or RENAME | ||||
| if the security policy of the | ||||
| saved filehandle rejects the security flavor used in the | ||||
| COMPOUND request's credentials. If the server does so, | ||||
| then if there is no intersection between the security | ||||
| policies of saved and current filehandles, this means that it | ||||
| will be impossible for the client to perform the intended | ||||
| LINK or RENAME operation. | ||||
| </t> | ||||
| <t> | ||||
| For example, suppose the client sends this COMPOUND | ||||
| request: SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH, | ||||
| RENAME "c" "d", where filehandles bFH and aFH refer | ||||
| to different directories. Suppose no common security | ||||
| tuple exists between the security policies of aFH and | ||||
| bFH. If the client sends the request using credentials | ||||
| acceptable to bFH's security policy but not aFH's | ||||
| policy, then the PUTFH aFH operation will fail with | ||||
| NFS4ERR_WRONGSEC. After a SECINFO_NO_NAME request, | ||||
| the client sends SEQUENCE, PUTFH bFH, SAVEFH, PUTFH | ||||
| aFH, RENAME "c" "d", using credentials acceptable to | ||||
| aFH's security policy but not bFH's policy. The server | ||||
| returns NFS4ERR_WRONGSEC on the RENAME operation. | ||||
| </t> | ||||
| <t> | ||||
| To prevent a client from an endless sequence of a | ||||
| request containing LINK or RENAME, followed by a request | ||||
| containing SECINFO_NO_NAME or SECINFO, the server <bcp14>MUST</bcp14> detect | ||||
| when the security policies of the current and saved | ||||
| filehandles have no mutually acceptable security tuple, | ||||
| and <bcp14>MUST NOT</bcp14> return NFS4ERR_WRONGSEC from LINK or RENAME | ||||
| in that situation. Instead | ||||
| the server <bcp14>MUST</bcp14> do one of two things: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The server can return NFS4ERR_XDEV. | ||||
| </li> | ||||
| <li> | ||||
| The server can | ||||
| allow the security policy of the current filehandle to | ||||
| override that of the saved filehandle, and so return NFS4_OK. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME --> | ||||
| </section> | ||||
| <!-- [auth] Security Error --> | ||||
| </section> | ||||
| <!-- [auth] Security Service Negotiation --> | ||||
| <section anchor="minor_versioning" numbered="true" toc="default"> | ||||
| <name>Minor Versioning</name> | ||||
| <t> | ||||
| To address the requirement of an NFS protocol that can evolve as the | ||||
| need arises, the NFSv4.1 protocol contains the rules and | ||||
| framework to allow for future minor changes or versioning. | ||||
| </t> | ||||
| <t> | ||||
| The base assumption with respect to minor versioning is that any | ||||
| future accepted minor version will be | ||||
| documented in one or more Standards Track RFCs. | ||||
| Minor version 0 of the NFSv4 protocol is represented by | ||||
| <xref target="RFC3530" format="default"/>, and minor version 1 is represented by | ||||
| this RFC. | ||||
| The COMPOUND and CB_COMPOUND | ||||
| procedures support the encoding of the minor version | ||||
| being requested by the client. | ||||
| </t> | ||||
| <t> | ||||
| The following items represent the basic rules for the development of | ||||
| minor versions. Note that a future minor version may modify | ||||
| or add to the following rules as part of the minor version definition. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| <t> | ||||
| Procedures are not added or deleted. | ||||
| </t> | ||||
| <t> | ||||
| To maintain the general RPC model, NFSv4 minor versions will | ||||
| not add to or delete procedures from the NFS program. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Minor versions may add operations to the COMPOUND and CB_COMPOUND | ||||
| procedures. | ||||
| </t> | ||||
| <t> | ||||
| The addition of operations to the COMPOUND and CB_COMPOUND procedures | ||||
| does not affect the RPC model. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| Minor versions may append attributes to the bitmap4 that represents | ||||
| sets of attributes and to the fattr4 that represents sets of attribute | ||||
| values. | ||||
| </t> | ||||
| <t> | ||||
| This allows for the expansion of the attribute model to allow for | ||||
| future growth or adaptation. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Minor version X must append any new attributes after the last | ||||
| documented attribute. | ||||
| </t> | ||||
| <t> | ||||
| Since attribute results are specified as an opaque array of | ||||
| per-attribute, XDR-encoded results, the complexity of adding new | ||||
| attributes in the midst of the current definitions would be too | ||||
| burdensome. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Minor versions must not modify the structure of an existing | ||||
| operation's arguments or results. | ||||
| </t> | ||||
| <t> | ||||
| Again, the complexity of handling multiple structure definitions for a | ||||
| single operation is too burdensome. New operations should be added | ||||
| instead of modifying existing structures for a minor version. | ||||
| </t> | ||||
| <t> | ||||
| This rule does not preclude the following adaptations in a minor version: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| adding bits to flag fields, such as new attributes to GETATTR's bitmap4 | ||||
| data type, and providing corresponding variants of opaque arrays, | ||||
| such as a notify4 used together with such bitmaps | ||||
| </li> | ||||
| <li> | ||||
| adding bits to existing attributes like ACLs that have flag words | ||||
| </li> | ||||
| <li> | ||||
| extending enumerated types (including NFS4ERR_*) with new values | ||||
| </li> | ||||
| <li> | ||||
| adding cases to a switched union | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| Minor versions must not modify the structure of existing attributes. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Minor versions must not delete operations. | ||||
| </t> | ||||
| <t> | ||||
| This prevents the potential reuse of a particular operation "slot" in | ||||
| a future minor version. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| Minor versions must not delete attributes. | ||||
| </li> | ||||
| <li> | ||||
| Minor versions must not delete flag bits or enumeration values. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Minor versions may declare an operation <bcp14>MUST NOT</bcp14> be implemented. | ||||
| </t> | ||||
| <t> | ||||
| Specifying that an operation <bcp14>MUST NOT</bcp14> be implemented is equivalent | ||||
| to obsoleting an operation. For the client, it means that the | ||||
| operation <bcp14>MUST NOT</bcp14> be sent to the server. For the server, an NFS | ||||
| error can be returned as opposed to "dropping" the request as an XDR | ||||
| decode error. This approach allows for the obsolescence of an | ||||
| operation while maintaining its structure so that a future minor version can reintroduce the operation. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| Minor versions may declare that an attribute <bcp14>MUST NOT</bcp14> be implemented. | ||||
| </li> | ||||
| <li> | ||||
| Minor versions may declare that a flag bit or enumeration value <bcp14>MUST NOT</bcp14> | ||||
| be implemented. | ||||
| </li> | ||||
| </ol> | ||||
| </li> | ||||
| <li> | ||||
| Minor versions may downgrade features from <bcp14>REQUIRED</bcp14> to <bcp14>RECOMMENDED</bcp14>, | ||||
| or <bcp14>RECOMMENDED</bcp14> to <bcp14>OPTIONAL</bcp14>. | ||||
| </li> | ||||
| <li> | ||||
| Minor versions may upgrade features from <bcp14>OPTIONAL</bcp14> to <bcp14>RECOMMENDED</bcp14>, or | ||||
| <bcp14>RECOMMENDED</bcp14> to <bcp14>REQUIRED</bcp14>. | ||||
| </li> | ||||
| <li> | ||||
| A client and server that support minor version X <bcp14>SHOULD</bcp14> support minor | ||||
| versions zero through X-1 as well. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Except for infrastructural changes, a minor version must not | ||||
| introduce <bcp14>REQUIRED</bcp14> new features. | ||||
| </t> | ||||
| <t> | ||||
| This rule allows for the introduction of new functionality and forces | ||||
| the use of implementation experience before designating a feature as | ||||
| <bcp14>REQUIRED</bcp14>. On the other hand, some classes of features are | ||||
| infrastructural and have broad effects. Allowing infrastructural features | ||||
| to be <bcp14>RECOMMENDED</bcp14> or <bcp14>OPTIONAL</bcp14> complicates implementation of the minor version. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| A client <bcp14>MUST NOT</bcp14> attempt to use a stateid, filehandle, or similar | ||||
| returned object from the COMPOUND procedure with minor version X for | ||||
| another COMPOUND procedure with minor version Y, where X != Y. | ||||
| </li> | ||||
| </ol> | ||||
| </section> | ||||
| <!-- [auth] Minor Versioning --> | ||||
| <section anchor="Non-RPC-based_Security_Services" numbered="true" toc="default"> | ||||
| <name>Non-RPC-Based Security Services</name> | ||||
| <t> | ||||
| As described in <xref target="Authentication_Integrity_Privacy" format="default"/>, | ||||
| NFSv4.1 relies on RPC for identification, | ||||
| authentication, integrity, and privacy. NFSv4.1 itself | ||||
| provides or enables additional security services as described in the | ||||
| next several subsections. | ||||
| </t> | ||||
| <section anchor="Authorization" numbered="true" toc="default"> | ||||
| <name>Authorization</name> | ||||
| <t> | ||||
| Authorization to access a file object via an NFSv4.1 | ||||
| operation is ultimately determined by the NFSv4.1 | ||||
| server. A client can predetermine its access to a file | ||||
| object via the OPEN (<xref target="OP_OPEN" format="default"/>) | ||||
| and the ACCESS (<xref target="OP_ACCESS" format="default"/>) | ||||
| operations. | ||||
| </t> | ||||
| <t> | ||||
| Principals with appropriate access rights can modify the | ||||
| authorization on a file object via the SETATTR | ||||
| (<xref target="OP_SETATTR" format="default"/>) operation. Attributes that affect | ||||
| access rights include mode, owner, owner_group, acl, dacl, and | ||||
| sacl. See <xref target="file_attributes" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Authorization --> | ||||
| <section anchor="Auditing" numbered="true" toc="default"> | ||||
| <name>Auditing</name> | ||||
| <t> | ||||
| NFSv4.1 provides auditing on a per-file object basis, via the acl | ||||
| and sacl attributes as described in <xref target="acl" format="default"/>. It is | ||||
| outside the scope of this specification to specify audit log | ||||
| formats or management policies. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Auditing --> | ||||
| <section anchor="Intrusion_Detection" numbered="true" toc="default"> | ||||
| <name>Intrusion Detection</name> | ||||
| <t> | ||||
| NFSv4.1 provides alarm control on a per-file object basis, via the | ||||
| acl and sacl attributes as described in <xref target="acl" format="default"/>. | ||||
| Alarms may serve as the basis for intrusion detection. It is | ||||
| outside the scope of this specification to specify heuristics for | ||||
| detecting intrusion via alarms. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Intrusion Detection --> | ||||
| </section> | ||||
| <!-- [auth] Non-RPC-based Security Services --> | ||||
| <section anchor="Transport_Layers" numbered="true" toc="default"> | ||||
| <name>Transport Layers</name> | ||||
| <section anchor="Required_and_Recommended_Transport_Attributes" numbered="true" toc="default"> | ||||
| <name><bcp14>REQUIRED</bcp14> and <bcp14>RECOMMENDED</bcp14> Properties of Transports</name> | ||||
| <t> | ||||
| NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA-based transports with | ||||
| the following attributes: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The transport supports reliable delivery of data, which | ||||
| NFSv4.1 requires but neither NFSv4.1 nor RPC has facilities | ||||
| for ensuring <xref target="Chet" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| The transport delivers data in the order it was sent. | ||||
| Ordered delivery simplifies detection of transmit | ||||
| errors, and simplifies the sending of arbitrary sized | ||||
| requests and responses via the record marking | ||||
| protocol <xref target="RFC5531" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Where an NFSv4.1 implementation supports operation | ||||
| over the IP network protocol, any transport used between | ||||
| NFS and IP <bcp14>MUST</bcp14> be among the IETF-approved congestion | ||||
| control transport protocols. At the time this document | ||||
| was written, the only two transports that had the above | ||||
| attributes were TCP and the Stream | ||||
| Control Transmission Protocol (SCTP). To enhance the | ||||
| possibilities for interoperability, an NFSv4.1 | ||||
| implementation <bcp14>MUST</bcp14> support operation over the TCP | ||||
| transport protocol. | ||||
| </t> | ||||
| <t> | ||||
| Even if NFSv4.1 is used over a non-IP network | ||||
| protocol, it is <bcp14>RECOMMENDED</bcp14> that the transport support | ||||
| congestion control. | ||||
| </t> | ||||
| <t> | ||||
| It is permissible for a connectionless transport to | ||||
| be used under NFSv4.1; however, reliable and in-order | ||||
| delivery of data combined with congestion control | ||||
| by the connectionless transport is | ||||
| <bcp14>REQUIRED</bcp14>. As a consequence, UDP by itself <bcp14>MUST NOT</bcp14> be used | ||||
| as an NFSv4.1 transport. NFSv4.1 assumes that a client transport | ||||
| address and server transport address used to send data | ||||
| over a transport together constitute a connection, | ||||
| even if the underlying transport eschews the concept | ||||
| of a connection. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Required and Recommended Transport Attributes --> | ||||
| <section anchor="Client_and_Server_Transport_Behavior" numbered="true" toc="default"> | ||||
| <name>Client and Server Transport Behavior</name> | ||||
| <t> | ||||
| If a connection-oriented transport (e.g., TCP) is used, | ||||
| the client and server <bcp14>SHOULD</bcp14> use long-lived connections | ||||
| for at least three reasons: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| This will prevent the weakening of the transport's | ||||
| congestion control mechanisms via short-lived | ||||
| connections. | ||||
| </li> | ||||
| <li> | ||||
| This will improve performance for the WAN environment | ||||
| by eliminating the need for connection setup | ||||
| handshakes. | ||||
| </li> | ||||
| <li> | ||||
| The NFSv4.1 callback model differs from NFSv4.0, and | ||||
| requires the client and server to maintain a | ||||
| client-created backchannel (see <xref target="conn_chann_assoc" format="default"/>) for the server to use. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| In order to reduce congestion, if a connection-oriented | ||||
| transport is used, and the request is not the NULL | ||||
| procedure: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| A requester <bcp14>MUST NOT</bcp14> retry a request unless the connection the request | ||||
| was sent over was lost before the reply was | ||||
| received. | ||||
| </li> | ||||
| <li> | ||||
| A replier <bcp14>MUST | ||||
| NOT</bcp14> silently drop a request, even if the request is a | ||||
| retry. (The silent drop behavior of RPCSEC_GSS | ||||
| <xref target="RFC2203" format="default"/> does not apply | ||||
| because this behavior happens at the RPCSEC_GSS layer, | ||||
| a lower layer in the request processing.) Instead, the | ||||
| replier <bcp14>SHOULD</bcp14> return an appropriate error (see | ||||
| <xref target="Slot_Identifiers_and_Server_Reply_Cache" format="default"/>), | ||||
| or it <bcp14>MAY</bcp14> disconnect the connection. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When sending a reply, the replier <bcp14>MUST</bcp14> send the reply | ||||
| to the same full network address (e.g., if using an | ||||
| IP-based transport, the source port of the requester | ||||
| is part of the full network address) from which the requester | ||||
| sent the request. If using a connection-oriented | ||||
| transport, replies <bcp14>MUST</bcp14> be sent on the same connection from which | ||||
| the request was received. | ||||
| </t> | ||||
| <t> | ||||
| If a connection is dropped after the replier receives | ||||
| the request but before the replier sends the reply, the | ||||
| replier might have a pending reply. | ||||
| If a connection is established with the same | ||||
| source and destination full network address as the | ||||
| dropped connection, then the replier <bcp14>MUST NOT</bcp14> send | ||||
| the reply until the requester retries the request. The | ||||
| reason for this prohibition is that the requester <bcp14>MAY</bcp14> | ||||
| retry a request over a different connection (provided that connection | ||||
| is associated with the original request's session). | ||||
| </t> | ||||
| <t> | ||||
| When using RDMA transports, there are other reasons for not | ||||
| tolerating retries over the same connection: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| RDMA transports use "credits" to enforce flow control, where | ||||
| a credit is a right to a peer to transmit a message. | ||||
| If one peer were to retransmit a request (or reply), it would | ||||
| consume an additional credit. | ||||
| If the replier | ||||
| retransmitted a reply, it would certainly result in an RDMA | ||||
| connection loss, since the requester would typically only post a | ||||
| single receive buffer for each request. If the requester | ||||
| retransmitted a request, the additional credit consumed on the | ||||
| server might lead to RDMA connection failure unless the client | ||||
| accounted for it and decreased its available credit, leading to | ||||
| wasted resources. | ||||
| </li> | ||||
| <li> | ||||
| RDMA credits present a new issue to the reply cache in | ||||
| NFSv4.1. The reply cache may be used when a connection within a | ||||
| session is lost, such as after the client reconnects. Credit | ||||
| information is a dynamic property of the RDMA connection, and stale | ||||
| values must not be replayed from the cache. This implies that the | ||||
| reply cache contents must not be blindly used when replies are | ||||
| sent from it, and credit information appropriate to the channel | ||||
| must be refreshed by the RPC layer. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| In addition, as described in | ||||
| <xref target="Retry_and_Replay" format="default"/>, while a session is active, | ||||
| the NFSv4.1 requester <bcp14>MUST NOT</bcp14> stop waiting for a reply. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Client and Server Transport Behavior --> | ||||
| <section anchor="Ports" numbered="true" toc="default"> | ||||
| <name>Ports</name> | ||||
| <t> | ||||
| Historically, NFSv3 servers have listened over | ||||
| TCP port 2049. The registered port 2049 <xref target="RFC3232" format="default"/> | ||||
| for the NFS protocol should be the default configuration. NFSv4.1 | ||||
| clients <bcp14>SHOULD NOT</bcp14> use the RPC binding protocols as described in | ||||
| <xref target="RFC1833" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Ports --> | ||||
| </section> | ||||
| <!-- [auth] Transport Layers --> | ||||
| <section anchor="Session" numbered="true" toc="default"> | ||||
| <name>Session</name> | ||||
| <t> | ||||
| NFSv4.1 clients and servers <bcp14>MUST</bcp14> support and <bcp14>MUST</bcp14> use the session | ||||
| feature as described in this section. | ||||
| </t> | ||||
| <section anchor="Motivation_and_Overview" numbered="true" toc="default"> | ||||
| <name>Motivation and Overview</name> | ||||
| <t> | ||||
| Previous versions and minor versions of NFS have suffered from | ||||
| the following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Lack of support for Exactly Once Semantics (EOS). This includes | ||||
| lack of support for EOS through server failure and recovery. | ||||
| </li> | ||||
| <li> | ||||
| Limited callback support, including no support for sending callbacks | ||||
| through firewalls, and races between replies to normal requests | ||||
| and callbacks. | ||||
| </li> | ||||
| <li> | ||||
| Limited trunking over multiple network paths. | ||||
| </li> | ||||
| <li> | ||||
| Requiring machine credentials for fully secure operation. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Through the introduction of a session, NFSv4.1 addresses the | ||||
| above shortfalls with practical solutions: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| EOS is enabled by a reply cache with a bounded size, | ||||
| making it feasible to keep the cache in persistent storage and enable | ||||
| EOS through server failure and recovery. One reason that | ||||
| previous revisions of NFS did not support EOS was | ||||
| because some EOS approaches often limited parallelism. | ||||
| As will be explained in | ||||
| <xref target="Exactly_Once_Semantics" format="default"/>, | ||||
| NFSv4.1 supports both EOS and unlimited parallelism. | ||||
| </li> | ||||
| <li> | ||||
| The NFSv4.1 client (defined in <xref target="client_def" format="default"/>) creates transport | ||||
| connections and provides them to the server to use for sending | ||||
| callback requests, thus solving the firewall issue | ||||
| (<xref target="OP_BIND_CONN_TO_SESSION" format="default"/>). Races between | ||||
| responses from client requests and callbacks caused by | ||||
| the requests are detected via the session's sequencing | ||||
| properties that are a consequence of EOS | ||||
| (<xref target="sessions_callback_races" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| The NFSv4.1 client can associate an arbitrary number of connections with | ||||
| the session, and thus provide trunking (<xref target="Trunking" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| The NFSv4.1 client and server produce a session key independent of client | ||||
| and server machine credentials which can be | ||||
| used to compute a digest for protecting critical session management operations | ||||
| (<xref target="protect_state_change" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| The NFSv4.1 client can also create secure RPCSEC_GSS contexts | ||||
| for use by the session's backchannel that do not require | ||||
| the server to authenticate to a client machine principal | ||||
| (<xref target="Backchannel_RPC_Security" format="default"/>). | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| A session is a dynamically created, long-lived server object | ||||
| created by a client and used over time from one or more transport | ||||
| connections. Its function is to maintain the server's state | ||||
| relative to the connection(s) belonging to a client instance. This | ||||
| state is entirely independent of the connection itself, and indeed | ||||
| the state exists whether or not the connection exists. A client may | ||||
| have one or more sessions associated with it so that | ||||
| client-associated state may be accessed using any of the sessions | ||||
| associated with that client's client ID, when connections are | ||||
| associated with those sessions. When no connections are associated | ||||
| with any of a client ID's sessions for an extended time, such | ||||
| objects as locks, opens, delegations, layouts, etc. are subject to | ||||
| expiration. The session serves as an object representing a means | ||||
| of access by a client to the associated client state on the server, | ||||
| independent of the physical means of access to that state. | ||||
| </t> | ||||
| <t> | ||||
| A single client may create multiple sessions. A single session <bcp14>MUST | ||||
| NOT</bcp14> serve multiple clients. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Motivation and Overview --> | ||||
| <section anchor="NFSv4_Integration" numbered="true" toc="default"> | ||||
| <name>NFSv4 Integration</name> | ||||
| <t> | ||||
| Sessions are part of NFSv4.1 and not NFSv4.0. Normally, a major | ||||
| infrastructure change such as sessions would require a new major | ||||
| version number to an Open Network Computing (ONC) RPC program like | ||||
| NFS. However, because NFSv4 encapsulates its functionality in a single procedure, COMPOUND, | ||||
| and because COMPOUND can support an arbitrary number of | ||||
| operations, sessions have been added to NFSv4.1 with little difficulty. COMPOUND includes | ||||
| a minor version number field, and for NFSv4.1 this minor version | ||||
| is set to 1. When the NFSv4 server processes a COMPOUND with | ||||
| the minor version set to 1, it expects a different set of | ||||
| operations than it does for NFSv4.0. NFSv4.1 defines the | ||||
| SEQUENCE operation, which is required for every | ||||
| COMPOUND that operates over an established session, with the | ||||
| exception of some session administration operations, such | ||||
| as DESTROY_SESSION (<xref target="OP_DESTROY_SESSION" format="default"/>). | ||||
| </t> | ||||
| <section anchor="SEQUENCE_and_CB_SEQUENCE" numbered="true" toc="default"> | ||||
| <name>SEQUENCE and CB_SEQUENCE</name> | ||||
| <t> | ||||
| In NFSv4.1, when the SEQUENCE operation is present, it <bcp14>MUST</bcp14> be | ||||
| the first operation in the COMPOUND procedure. The primary purpose | ||||
| of SEQUENCE is to carry the session identifier. The session identifier | ||||
| associates all other operations in the COMPOUND procedure with | ||||
| a particular session. SEQUENCE also contains required information | ||||
| for maintaining EOS (see <xref target="Exactly_Once_Semantics" format="default"/>). | ||||
| Session-enabled NFSv4.1 COMPOUND requests thus have the form: | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| +-----+--------------+-----------+------------+-----------+---- | ||||
| | tag | minorversion | numops |SEQUENCE op | op + args | ... | ||||
| | | (== 1) | (limited) | + args | | | ||||
| +-----+--------------+-----------+------------+-----------+---- | ||||
| ]]></artwork> | ||||
| <t> | ||||
| and the replies have the form: | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| +------------+-----+--------+-------------------------------+--// | ||||
| |last status | tag | numres |status + SEQUENCE op + results | // | ||||
| +------------+-----+--------+-------------------------------+--// | ||||
| //-----------------------+---- | ||||
| // status + op + results | ... | ||||
| //-----------------------+---- | ||||
| ]]></artwork> | ||||
| <t> | ||||
| A CB_COMPOUND procedure request and reply has a similar form to | ||||
| COMPOUND, but | ||||
| instead of a SEQUENCE operation, there is a CB_SEQUENCE operation. | ||||
| CB_COMPOUND also has an additional field called "callback_ident", which | ||||
| is superfluous in NFSv4.1 and <bcp14>MUST</bcp14> be ignored by | ||||
| the client. CB_SEQUENCE has the same information | ||||
| as SEQUENCE, and also includes other information needed to resolve | ||||
| callback races | ||||
| (<xref target="sessions_callback_races" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] SEQUENCE and CB_SEQUENCE --> | ||||
| <section anchor="Client_ID_and_Session_Association" numbered="true" toc="default"> | ||||
| <name>Client ID and Session Association</name> | ||||
| <t> | ||||
| Each client ID (<xref target="Client_Identifiers" format="default"/>) can have | ||||
| zero or more active sessions. A client ID and associated | ||||
| session are required to perform file access in | ||||
| NFSv4.1. Each time a session is used (whether by a client sending | ||||
| a request to the server or the client replying to a callback | ||||
| request from the server), the state leased to its associated | ||||
| client ID is automatically renewed. | ||||
| </t> | ||||
| <t> | ||||
| State (which can consist of share reservations, locks, delegations, | ||||
| and layouts (<xref target="intro_locking" format="default"/>)) is tied to | ||||
| the client ID. Client state is not tied to any individual session. | ||||
| Successive state changing operations from a given state | ||||
| owner <bcp14>MAY</bcp14> go over different sessions, provided the | ||||
| session is associated with the same client ID. A callback | ||||
| <bcp14>MAY</bcp14> arrive over a different session than that of the request | ||||
| that originally acquired the state pertaining to the | ||||
| callback. For example, if session A is used to | ||||
| acquire a delegation, a request to recall the | ||||
| delegation <bcp14>MAY</bcp14> arrive over session B if both sessions are | ||||
| associated with the same client ID. Sections | ||||
| <xref target="Session_Callback_Security" format="counter"/> and | ||||
| <xref target="Backchannel_RPC_Security" format="counter"/> discuss | ||||
| the security considerations around callbacks. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Client ID and Session Association --> | ||||
| </section> | ||||
| <!-- [auth] NFSv4 Integration --> | ||||
| <section anchor="Channels" numbered="true" toc="default"> | ||||
| <name>Channels</name> | ||||
| <t> | ||||
| A channel is not a connection. A channel represents the | ||||
| direction ONC RPC requests are sent. | ||||
| </t> | ||||
| <t> | ||||
| Each session has one or two channels: the fore channel and the backchannel. | ||||
| Because there are at most two channels per session, and because each | ||||
| channel has a distinct purpose, channels are not assigned | ||||
| identifiers. | ||||
| </t> | ||||
| <t> | ||||
| The fore channel is | ||||
| used for ordinary requests from the client to the server, and | ||||
| carries COMPOUND requests and responses. | ||||
| A session always has a fore channel. | ||||
| </t> | ||||
| <t> | ||||
| The backchannel is used for callback requests from server | ||||
| to client, and carries CB_COMPOUND requests and responses. | ||||
| Whether or not there is a backchannel is decided by the | ||||
| client; however, many features of NFSv4.1 require a backchannel. | ||||
| NFSv4.1 servers <bcp14>MUST</bcp14> support backchannels. | ||||
| </t> | ||||
| <t> | ||||
| Each session has resources for each channel, | ||||
| including separate reply caches (see | ||||
| <xref target="Slot_Identifiers_and_Server_Reply_Cache" format="default"/>). | ||||
| Note that even the backchannel requires a reply cache (or, at least, | ||||
| a slot table in order to detect retries) because | ||||
| some callback operations are non-idempotent. | ||||
| </t> | ||||
| <section anchor="conn_chann_assoc" numbered="true" toc="default"> | ||||
| <name>Association of Connections, Channels, and Sessions</name> | ||||
| <t> | ||||
| Each channel is associated with zero or more transport | ||||
| connections (whether of the same transport protocol or different | ||||
| transport protocols). A connection can be associated with | ||||
| one channel or both channels of a session; the client | ||||
| and server negotiate whether a connection will carry | ||||
| traffic for one channel or both channels via the | ||||
| CREATE_SESSION (<xref target="OP_CREATE_SESSION" format="default"/>) and the BIND_CONN_TO_SESSION (<xref target="OP_BIND_CONN_TO_SESSION" format="default"/>) operations. When a | ||||
| session is created via CREATE_SESSION, the connection | ||||
| that transported the CREATE_SESSION request is | ||||
| automatically associated with the fore channel, and | ||||
| optionally the backchannel. If the client specifies no | ||||
| state protection (<xref target="OP_EXCHANGE_ID" format="default"/>) | ||||
| when the session is created, then when SEQUENCE is | ||||
| transmitted on a different connection, the connection | ||||
| is automatically associated with the fore channel of | ||||
| the session specified in the SEQUENCE operation. | ||||
| </t> | ||||
| <t> | ||||
| A connection's association with a session is | ||||
| not exclusive. A connection associated with the channel(s) | ||||
| of one session may be simultaneously | ||||
| associated with the channel(s) of other sessions including | ||||
| sessions associated with other client IDs. | ||||
| </t> | ||||
| <t> | ||||
| It is permissible for connections of multiple transport | ||||
| types to be associated with the same channel. For | ||||
| example, both TCP and RDMA connections can be | ||||
| associated with the fore channel. In the event an | ||||
| RDMA and non-RDMA connection are associated with the | ||||
| same channel, the maximum number of slots <bcp14>SHOULD</bcp14> be | ||||
| at least one more than the total number of RDMA credits | ||||
| (<xref target="Slot_Identifiers_and_Server_Reply_Cache" format="default"/>). | ||||
| This way, if all RDMA credits are used, the non-RDMA | ||||
| connection can have at least one outstanding request. | ||||
| If a server supports multiple transport types, it <bcp14>MUST</bcp14> | ||||
| allow a client to associate connections from each transport | ||||
| to a channel. | ||||
| </t> | ||||
| <t> | ||||
| It is permissible for a connection of one type of | ||||
| transport to be associated with the fore channel, | ||||
| and a connection of a different type to be associated | ||||
| with the backchannel. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] Channels --> | ||||
| <section anchor="Server_Scope" numbered="true" toc="default"> | ||||
| <name>Server Scope</name> | ||||
| <t> | ||||
| Servers each specify a server scope value in the form | ||||
| of an opaque string eir_server_scope returned as part of | ||||
| the results of an EXCHANGE_ID operation. The purpose of | ||||
| the server scope is to allow a group of servers to | ||||
| indicate to clients that a set of servers sharing the | ||||
| same server scope value has arranged to use distinct | ||||
| values of opaque identifiers so that the two servers never | ||||
| assign the same value to two distinct objects. Thus, the identifiers | ||||
| generated by two servers within that set can be assumed compatible | ||||
| so that, in certain important cases, | ||||
| identifiers generated by one server in that set may be | ||||
| presented to | ||||
| another server of the same scope. | ||||
| </t> | ||||
| <t> | ||||
| The use of such compatible values does not imply that | ||||
| a value generated by one server will always be accepted | ||||
| by another. In most cases, it will not. However, a | ||||
| server will not inadvertently accept a value generated by another | ||||
| server. When it does accept it, it will be because | ||||
| it is recognized as valid and carrying the same meaning | ||||
| as on another server of the same scope. | ||||
| </t> | ||||
| <t> | ||||
| When servers are of the same server scope, this compatibility | ||||
| of values applies to the following identifiers: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Filehandle values. A filehandle value accepted by two | ||||
| servers of the same server scope denotes the same object. | ||||
| A WRITE operation sent to one server is reflected immediately | ||||
| in a READ sent to the other. | ||||
| </li> | ||||
| <li> | ||||
| Server owner values. When the server scope values are | ||||
| the same, server owner value may be validly compared. | ||||
| In cases where the server scope values are different, server | ||||
| owner values are treated as different even if they | ||||
| contain identical strings of bytes. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The coordination among servers required to provide such | ||||
| compatibility can be quite minimal, and limited to a simple | ||||
| partition of the ID space. The recognition of common values | ||||
| requires additional implementation, but this can be tailored | ||||
| to the specific situations in which that recognition is | ||||
| desired. | ||||
| </t> | ||||
| <t> | ||||
| Clients will have occasion to compare the server scope values | ||||
| of multiple servers under a number of circumstances, each of | ||||
| which will be discussed under the appropriate functional | ||||
| section: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| When server owner values received in response to | ||||
| EXCHANGE_ID operations sent to multiple network | ||||
| addresses are compared for the purpose of determining | ||||
| the validity of various forms of trunking, as described | ||||
| in <xref target="SEC11-USES-trunk" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| When network or server reconfiguration causes the same | ||||
| network address to possibly be directed to different | ||||
| servers, with the necessity for the client to determine | ||||
| when lock reclaim should be attempted, as described | ||||
| in <xref target="reclaim_locks" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When two replies from EXCHANGE_ID, each from two different | ||||
| server network addresses, have the same server scope, there | ||||
| are a number of ways a client can validate that the common | ||||
| server scope is due to two servers cooperating in a group. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If both EXCHANGE_ID requests were sent with RPCSEC_GSS | ||||
| (<xref target="RFC2203" format="default"/>, <xref target="RFC5403" format="default"/>, | ||||
| <xref target="RFC7861" format="default"/>) | ||||
| authentication and the server principal is the same for | ||||
| both targets, the equality of server scope is validated. | ||||
| It is <bcp14>RECOMMENDED</bcp14> that two servers intending to share the | ||||
| same server scope and server_owner major_id also share the | ||||
| same principal name. In some cases, this | ||||
| simplifies the client's task of validating server scope. | ||||
| </li> | ||||
| <li> | ||||
| The client may accept the appearance of the second | ||||
| server in the fs_locations or fs_locations_info attribute | ||||
| for a relevant file system. For example, if there is | ||||
| a migration event for a particular file system | ||||
| or there are locks to be reclaimed on a particular file | ||||
| system, the attributes for that particular file system | ||||
| may be used. The client sends the GETATTR request to | ||||
| the first server for the fs_locations or | ||||
| fs_locations_info attribute with RPCSEC_GSS | ||||
| authentication. It may need to do this in advance | ||||
| of the need to verify the common server scope. | ||||
| If the client successfully authenticates the reply | ||||
| to GETATTR, and the GETATTR request and reply containing | ||||
| the fs_locations or fs_locations_info attribute refers | ||||
| to the second server, then the equality of server scope | ||||
| is supported. A client may choose to limit the use of | ||||
| this form of support to information relevant to the | ||||
| specific file system involved (e.g. a file system | ||||
| being migrated). | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="Trunking" numbered="true" toc="default"> | ||||
| <name>Trunking</name> | ||||
| <t> | ||||
| Trunking is the use of multiple connections between a | ||||
| client and server in order to increase the speed of data | ||||
| transfer. NFSv4.1 supports two types of trunking: | ||||
| session trunking and client ID trunking. | ||||
| </t> | ||||
| <t> | ||||
| In the context of a single server network address, it | ||||
| can be assumed that all connections are accessing the | ||||
| same server, and NFSv4.1 | ||||
| servers <bcp14>MUST</bcp14> support both forms of trunking. When | ||||
| multiple connections use a set of network addresses | ||||
| to access the same server, the server | ||||
| <bcp14>MUST</bcp14> support both forms of trunking. | ||||
| NFSv4.1 servers in a clustered configuration <bcp14>MAY</bcp14> allow | ||||
| network addresses for different servers to use client ID | ||||
| trunking. | ||||
| </t> | ||||
| <t> | ||||
| Clients may use either form of trunking as long as they | ||||
| do not, when trunking between different server network | ||||
| addresses, violate the servers' mandates as to the | ||||
| kinds of trunking to be allowed (see below). With regard | ||||
| to callback channels, the client <bcp14>MUST</bcp14> allow the server to | ||||
| choose among all callback channels valid for a given | ||||
| client ID and <bcp14>MUST</bcp14> support trunking when the connections | ||||
| supporting the backchannel allow session or client ID | ||||
| trunking to be used for callbacks. | ||||
| </t> | ||||
| <t> | ||||
| Session trunking is essentially the association of multiple | ||||
| connections, each with potentially different target and/or source | ||||
| network addresses, to the same session. When the target network | ||||
| addresses (server addresses) of the two connections are the same, | ||||
| the server <bcp14>MUST</bcp14> | ||||
| support such session trunking. When the target network addresses | ||||
| are different, the server <bcp14>MAY</bcp14> indicate such support using the | ||||
| data returned by the EXCHANGE_ID operation (see below). | ||||
| </t> | ||||
| <t> | ||||
| Client ID trunking is the association of multiple | ||||
| sessions to the same client ID. Servers <bcp14>MUST</bcp14> support client ID | ||||
| trunking for two target network addresses whenever they allow | ||||
| session trunking for those same two network addresses. | ||||
| In addition, a server <bcp14>MAY</bcp14>, by presenting the same | ||||
| major server owner ID | ||||
| (<xref target="Server_Owners" format="default"/>) and server scope | ||||
| (<xref target="Server_Scope" format="default"/>), allow an additional | ||||
| case of client ID trunking. When two | ||||
| servers return the same major server owner and server | ||||
| scope, it means that the two servers are cooperating on | ||||
| locking state management, which is a prerequisite | ||||
| for client ID trunking. | ||||
| </t> | ||||
| <t> | ||||
| Distinguishing when the client is allowed to use session and | ||||
| client ID trunking requires understanding how the results of the | ||||
| EXCHANGE_ID (<xref target="OP_EXCHANGE_ID" format="default"/>) | ||||
| operation identify a server. | ||||
| Suppose a client sends EXCHANGE_IDs over two different | ||||
| connections, each with a possibly different target | ||||
| network address, but each EXCHANGE_ID operation has the same | ||||
| value in the eia_clientowner field. If the same | ||||
| NFSv4.1 server is listening over each connection, | ||||
| then each EXCHANGE_ID result <bcp14>MUST</bcp14> return the same | ||||
| values of eir_clientid, eir_server_owner.so_major_id, | ||||
| and eir_server_scope. The client can then treat each | ||||
| connection as referring to the same server (subject | ||||
| to verification; see | ||||
| <xref target="PREP-trunk-verify" format="default"/> below), | ||||
| and it can use each connection to trunk requests and | ||||
| replies. | ||||
| The client's choice is whether session trunking | ||||
| or client ID trunking applies. | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>Session Trunking.</dt> | ||||
| <dd> | ||||
| <t> | ||||
| If the eia_clientowner argument is the same in | ||||
| two different EXCHANGE_ID requests, and | ||||
| the eir_clientid, eir_server_owner.so_major_id, | ||||
| eir_server_owner.so_minor_id, and eir_server_scope | ||||
| results match in both EXCHANGE_ID results, then | ||||
| the client is permitted to perform session trunking. | ||||
| If the client has no session mapping to the tuple of | ||||
| eir_clientid, eir_server_owner.so_major_id, eir_server_scope, and | ||||
| eir_server_owner.so_minor_id, then it creates | ||||
| the session via a CREATE_SESSION operation over one | ||||
| of the connections, which associates the connection | ||||
| to the session. If there is a session for the tuple, | ||||
| the client can send BIND_CONN_TO_SESSION to associate | ||||
| the connection to the session. | ||||
| </t> | ||||
| <t> | ||||
| Of course, if the client | ||||
| does not desire to use session trunking, it is not | ||||
| required to do so. It can invoke | ||||
| CREATE_SESSION on the connection. This will result | ||||
| in client ID trunking as described below. It can also | ||||
| decide to drop the connection if it does not choose to | ||||
| use trunking. | ||||
| </t> | ||||
| </dd> | ||||
| <dt>Client ID Trunking.</dt> | ||||
| <dd> | ||||
| <t> | ||||
| If the eia_clientowner argument is the same in | ||||
| two different EXCHANGE_ID requests, and | ||||
| the eir_clientid, eir_server_owner.so_major_id, | ||||
| and eir_server_scope | ||||
| results match in both EXCHANGE_ID results, then | ||||
| the client is permitted to perform client ID trunking | ||||
| (regardless of whether the eir_server_owner.so_minor_id results match). | ||||
| The client can associate | ||||
| each connection with different sessions, where | ||||
| each session is associated with the same server. | ||||
| </t> | ||||
| <t> | ||||
| The client completes the act of client ID trunking by invoking | ||||
| CREATE_SESSION on each connection, using the same | ||||
| client ID that was returned in eir_clientid. These | ||||
| invocations create two sessions and also associate | ||||
| each connection with its respective session. The client | ||||
| is free to decline to use client ID trunking by simply | ||||
| dropping the connection at this point. | ||||
| </t> | ||||
| <t> | ||||
| When doing client ID trunking, locking state | ||||
| is shared across sessions associated with that same | ||||
| client ID. This requires the server to coordinate | ||||
| state across sessions and the client to be able to | ||||
| associate the same locking state with multiple sessions. | ||||
| </t> | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| It is always possible that, as a result of various sorts | ||||
| of reconfiguration events, eir_server_scope and | ||||
| eir_server_owner values may be different on subsequent | ||||
| EXCHANGE_ID requests made to the same network address. | ||||
| </t> | ||||
| <t> | ||||
| In most cases, such reconfiguration events will be | ||||
| disruptive and indicate that an IP address formerly connected | ||||
| to one server is now connected to an entirely different one. | ||||
| </t> | ||||
| <t> | ||||
| Some guidelines on client handling of such situations follow: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| When eir_server_scope changes, the client has no assurance | ||||
| that any IDs that it obtained previously (e.g., filehandles) can | ||||
| be validly used on the new server, and, even if the new | ||||
| server accepts them, there is no assurance that this is not | ||||
| due to accident. Thus, it is best to treat all such state | ||||
| as lost or stale, although a client may assume that the | ||||
| probability of inadvertent acceptance is low and treat | ||||
| this situation as within the next case. | ||||
| </li> | ||||
| <li> | ||||
| When eir_server_scope remains the same and | ||||
| eir_server_owner.so_major_id changes, the client can use | ||||
| the filehandles it has, consider its locking state lost, | ||||
| and attempt | ||||
| to reclaim or otherwise re-obtain its locks. It might find | ||||
| that | ||||
| its filehandle is now stale. However, if NFS4ERR_STALE is not | ||||
| returned, it can proceed to reclaim or otherwise re-obtain its | ||||
| open locking state. | ||||
| </li> | ||||
| <li> | ||||
| When eir_server_scope and | ||||
| eir_server_owner.so_major_id remain the same, | ||||
| the client has to use the now-current values | ||||
| of eir_server_owner.so_minor_id in deciding on appropriate | ||||
| forms of trunking. This may result in connections being | ||||
| dropped or new sessions being created. | ||||
| </li> | ||||
| </ul> | ||||
| <section anchor="PREP-trunk-verify" numbered="true" toc="default"> | ||||
| <name>Verifying Claims of Matching Server Identity</name> | ||||
| <t> | ||||
| When the server responds using two different connections that claim | ||||
| matching or partially matching eir_server_owner, | ||||
| eir_server_scope, and eir_clientid values, the client | ||||
| does not have to trust the servers' claims. The client | ||||
| may verify these claims before trunking traffic in | ||||
| the following ways: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| For session trunking, | ||||
| clients <bcp14>SHOULD</bcp14> | ||||
| reliably verify if connections between different | ||||
| network paths are in fact associated with the same NFSv4.1 | ||||
| server and usable on the same session, and servers | ||||
| <bcp14>MUST</bcp14> allow clients to perform reliable verification. | ||||
| When a client ID is created, the client <bcp14>SHOULD</bcp14> specify that | ||||
| BIND_CONN_TO_SESSION is to be verified according to the | ||||
| SP4_SSV or SP4_MACH_CRED (<xref target="OP_EXCHANGE_ID" format="default"/>) | ||||
| state protection options. For SP4_SSV, reliable | ||||
| verification depends on a shared secret (the | ||||
| SSV) that is established via the SET_SSV (see | ||||
| <xref target="OP_SET_SSV" format="default"/>) operation. | ||||
| </t> | ||||
| <t> | ||||
| When a new connection is associated with the | ||||
| session (via the BIND_CONN_TO_SESSION operation, | ||||
| see <xref target="OP_BIND_CONN_TO_SESSION" format="default"/>), if | ||||
| the client specified SP4_SSV state protection for the | ||||
| BIND_CONN_TO_SESSION operation, the client <bcp14>MUST</bcp14> send | ||||
| the BIND_CONN_TO_SESSION with RPCSEC_GSS protection, | ||||
| using integrity or privacy, and an RPCSEC_GSS handle created | ||||
| with the GSS SSV mechanism (see <xref target="ssv_mech" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| If the client mistakenly tries to associate a | ||||
| connection to a session of a wrong server, the | ||||
| server will either reject the attempt because | ||||
| it is not aware of the session identifier of the | ||||
| BIND_CONN_TO_SESSION arguments, or it will reject | ||||
| the attempt because the RPCSEC_GSS authentication | ||||
| fails. Even if the server mistakenly or maliciously | ||||
| accepts the connection association attempt, the | ||||
| RPCSEC_GSS verifier it computes in the response | ||||
| will not be verified by the client, so the client will | ||||
| know it cannot use the connection for trunking the | ||||
| specified session. </t> | ||||
| <t> If the | ||||
| client specified SP4_MACH_CRED state protection, the | ||||
| BIND_CONN_TO_SESSION operation will use RPCSEC_GSS | ||||
| integrity or privacy, using the same credential that | ||||
| was used when the client ID was created. Mutual | ||||
| authentication via RPCSEC_GSS assures the client | ||||
| that the connection is associated with the correct | ||||
| session of the correct server. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| For client ID trunking, the client has at least two | ||||
| options for verifying that the same client ID | ||||
| obtained from two different EXCHANGE_ID operations | ||||
| came from the same server. The first option is | ||||
| to use RPCSEC_GSS authentication when sending each | ||||
| EXCHANGE_ID operation. Each time an EXCHANGE_ID is sent with | ||||
| RPCSEC_GSS authentication, the client notes the | ||||
| principal name of the GSS target. If the EXCHANGE_ID | ||||
| results indicate that client ID trunking is possible, | ||||
| and the GSS targets' principal names are the same, | ||||
| the servers are the same and client ID trunking is | ||||
| allowed. | ||||
| </t> | ||||
| <t> | ||||
| The second option for verification is to | ||||
| use SP4_SSV protection. When the client sends | ||||
| EXCHANGE_ID, it specifies SP4_SSV protection. The | ||||
| first EXCHANGE_ID the client sends always has to | ||||
| be confirmed by a CREATE_SESSION call. The client | ||||
| then sends SET_SSV. Later, the client | ||||
| sends EXCHANGE_ID to a second destination | ||||
| network address different from the one the first | ||||
| EXCHANGE_ID was sent to. | ||||
| The client checks that each EXCHANGE_ID reply has the | ||||
| same eir_clientid, eir_server_owner.so_major_id, and | ||||
| eir_server_scope. If so, the client verifies the | ||||
| claim by sending a CREATE_SESSION operation to the second | ||||
| destination address, protected with RPCSEC_GSS integrity | ||||
| using an RPCSEC_GSS handle returned by the second | ||||
| EXCHANGE_ID. If the server accepts the CREATE_SESSION | ||||
| request, and if the client verifies the RPCSEC_GSS | ||||
| verifier and integrity codes, then the client has | ||||
| proof the second server knows the SSV, and thus | ||||
| the two servers are cooperating for the purposes of | ||||
| specifying server scope and client ID trunking. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="Exactly_Once_Semantics" numbered="true" toc="default"> | ||||
| <name>Exactly Once Semantics</name> | ||||
| <t> | ||||
| Via the session, NFSv4.1 offers exactly once semantics (EOS) | ||||
| for requests sent over a channel. EOS is supported on both the | ||||
| fore channel and backchannel. | ||||
| </t> | ||||
| <t> | ||||
| Each COMPOUND or CB_COMPOUND request that is sent | ||||
| with a leading SEQUENCE or CB_SEQUENCE operation <bcp14>MUST</bcp14> | ||||
| be executed by the receiver exactly once. This requirement | ||||
| holds regardless of whether the request is sent with reply | ||||
| caching specified (see <xref target="optional_reply_caching" format="default"/>). | ||||
| The requirement holds even if the requester is sending the | ||||
| request over a session created between a pNFS data client | ||||
| and pNFS data server. To understand the rationale for this requirement, | ||||
| divide the requests into three | ||||
| classifications: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Non-idempotent requests. | ||||
| </li> | ||||
| <li> | ||||
| Idempotent modifying requests. | ||||
| </li> | ||||
| <li> | ||||
| Idempotent non-modifying requests. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| An example of a non-idempotent request is | ||||
| RENAME. Obviously, if a replier executes the | ||||
| same RENAME request twice, and the first execution succeeds, | ||||
| the re-execution will fail. If the replier returns the | ||||
| result from the re-execution, this result is incorrect. | ||||
| Therefore, EOS is required for non-idempotent requests. | ||||
| </t> | ||||
| <t> | ||||
| An example of an idempotent modifying request is | ||||
| a COMPOUND request containing a WRITE operation. | ||||
| Repeated execution of the same WRITE | ||||
| has the same effect as execution of that WRITE a single time. | ||||
| Nevertheless, enforcing EOS for WRITEs and other idempotent | ||||
| modifying requests is necessary | ||||
| to avoid data corruption. | ||||
| </t> | ||||
| <t> | ||||
| Suppose a client sends WRITE A to a | ||||
| noncompliant server that does not enforce EOS, and | ||||
| receives no response, perhaps due to a network | ||||
| partition. The client reconnects to the server and | ||||
| re-sends WRITE A. Now, the server has | ||||
| outstanding two instances of A. The | ||||
| server can be in a situation in which it executes and | ||||
| replies to the retry of A, while the first | ||||
| A is still waiting in the server's internal I/O system for some | ||||
| resource. Upon receiving the | ||||
| reply to the second attempt of WRITE A, | ||||
| the client believes its WRITE is done so it is free | ||||
| to send WRITE B, which overlaps the byte-range of | ||||
| A. When the original A is dispatched from the server's | ||||
| I/O system and | ||||
| executed (thus the second time A will have | ||||
| been written), then what has been | ||||
| written by B can be overwritten and thus corrupted. | ||||
| </t> | ||||
| <t> | ||||
| An example of an idempotent non-modifying request | ||||
| is a COMPOUND containing SEQUENCE, PUTFH, READLINK, | ||||
| and nothing else. The re-execution of such a | ||||
| request will not cause data corruption or | ||||
| produce an incorrect result. Nonetheless, | ||||
| to keep the implementation simple, | ||||
| the replier <bcp14>MUST</bcp14> enforce EOS for all requests, whether or not | ||||
| idempotent and non-modifying. | ||||
| </t> | ||||
| <t> | ||||
| Note that true and complete EOS is not possible unless the | ||||
| server persists the reply cache in stable storage, and unless the | ||||
| server is somehow implemented to never require a restart | ||||
| (indeed, if such a server exists, the distinction between a | ||||
| reply cache kept in stable storage versus one that is not is | ||||
| one without meaning). See <xref target="Persistence" format="default"/> for | ||||
| a discussion of persistence in the reply cache. | ||||
| Regardless, even if the server does not persist the reply cache, | ||||
| EOS improves robustness and correctness over previous versions | ||||
| of NFS because the legacy duplicate request/reply caches were | ||||
| based on the ONC RPC transaction identifier (XID). | ||||
| <xref target="Slot_Identifiers_and_Server_Reply_Cache" format="default"/> | ||||
| explains the shortcomings of the XID as a basis for | ||||
| a reply cache and describes how NFSv4.1 sessions improve | ||||
| upon the XID. | ||||
| </t> | ||||
| <section anchor="Slot_Identifiers_and_Server_Reply_Cache" numbered="true" toc="default"> | ||||
| <name>Slot Identifiers and Reply Cache</name> | ||||
| <t> | ||||
| The RPC layer provides a transaction ID (XID), which, | ||||
| while required to be unique, is not | ||||
| convenient for tracking requests for two reasons. | ||||
| First, the XID is only | ||||
| meaningful to the requester; it cannot be interpreted | ||||
| by the replier except to test for equality with | ||||
| previously sent requests. When consulting an RPC-based | ||||
| duplicate request cache, the opaqueness of the XID requires | ||||
| a computationally expensive look up (often via a hash that | ||||
| includes XID and source address). NFSv4.1 requests use | ||||
| a non-opaque slot ID, which is an index into a slot table, | ||||
| which is far more efficient. Second, because RPC requests | ||||
| can be executed by the replier in any order, there is | ||||
| no bound on the number of requests that may be outstanding | ||||
| at any time. To achieve perfect EOS, using ONC RPC | ||||
| would require storing all replies in the reply cache. | ||||
| XIDs are 32 bits; storing over four billion (2<sup>32</sup>) replies | ||||
| in the reply cache is not practical. In practice, previous versions | ||||
| of NFS have chosen to store a fixed number of replies in | ||||
| the cache, and to use a least recently used (LRU) approach to | ||||
| replacing cache entries with new entries when the cache | ||||
| is full. In NFSv4.1, the number of outstanding requests is | ||||
| bounded by the size of the slot table, and a sequence ID | ||||
| per slot is used to tell the replier when it is safe to | ||||
| delete a cached reply. | ||||
| </t> | ||||
| <t> | ||||
| In the NFSv4.1 reply cache, when the requester sends a new request, | ||||
| it selects a slot ID in the | ||||
| range 0..N, where N is the replier's current maximum slot ID | ||||
| granted to the requester on the session over which the request is to be | ||||
| sent. The value of N starts out as equal to | ||||
| ca_maxrequests - 1 (<xref target="OP_CREATE_SESSION" format="default"/>), but | ||||
| can be adjusted by the response to SEQUENCE or CB_SEQUENCE as described | ||||
| later in this section. | ||||
| The slot ID must be unused by any of the requests that the | ||||
| requester has already active on the session. "Unused" here means the | ||||
| requester has no outstanding request for that slot ID. | ||||
| </t> | ||||
| <t> | ||||
| A slot contains a sequence ID and the cached reply corresponding to | ||||
| the request sent with that sequence ID. The sequence ID is a | ||||
| 32-bit unsigned value, and is therefore in the range 0..0xFFFFFFFF (2<sup>32</sup> - 1). | ||||
| The first time a slot is used, the requester <bcp14>MUST</bcp14> specify | ||||
| a sequence ID of one (<xref target="OP_CREATE_SESSION" format="default"/>). | ||||
| Each time a slot is reused, the request <bcp14>MUST</bcp14> specify a sequence ID | ||||
| that is one greater than that of the previous request on the | ||||
| slot. If the previous sequence ID was 0xFFFFFFFF, then the next | ||||
| request for the slot <bcp14>MUST</bcp14> have the sequence ID set to zero (i.e., | ||||
| (2<sup>32</sup> - 1) + 1 mod 2<sup>32</sup>). | ||||
| </t> | ||||
| <t> | ||||
| The sequence ID accompanies the slot ID in each request. It is | ||||
| for the critical check at the replier: it used to efficiently | ||||
| determine whether a request using a certain | ||||
| slot ID is a retransmit or a new, never-before-seen request. It is | ||||
| not feasible for the requester to assert that it is retransmitting to | ||||
| implement this, because for any given request the requester cannot | ||||
| know whether the replier has seen it unless the replier actually replies. Of | ||||
| course, if the requester has seen the reply, the requester would | ||||
| not retransmit. | ||||
| </t> | ||||
| <t> | ||||
| The replier compares each received request's | ||||
| sequence ID with the last one previously received for that slot ID, | ||||
| to see if the new request is: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| A new request, in which the sequence ID is one greater | ||||
| than that previously seen in the slot (accounting for sequence | ||||
| wraparound). The replier proceeds to execute the new request, | ||||
| and the replier | ||||
| <bcp14>MUST</bcp14> increase the slot's sequence ID by one. | ||||
| </li> | ||||
| <li> | ||||
| A retransmitted request, in which the sequence ID is equal to | ||||
| that currently recorded in the slot. | ||||
| If the original request has | ||||
| executed to completion, the replier returns the cached | ||||
| reply. See <xref target="Retry_and_Replay" format="default"/> for direction on how the replier | ||||
| deals with retries of requests that are still in progress. | ||||
| </li> | ||||
| <li> | ||||
| A misordered retry, in which the sequence ID | ||||
| is less than (accounting for sequence wraparound) | ||||
| that previously seen in the slot. The | ||||
| replier <bcp14>MUST</bcp14> return NFS4ERR_SEQ_MISORDERED (as the | ||||
| result from SEQUENCE or CB_SEQUENCE). | ||||
| </li> | ||||
| <li> | ||||
| A misordered new request, in which the sequence ID | ||||
| is two or more than (accounting for sequence | ||||
| wraparound) that previously seen in the | ||||
| slot. Note that because the sequence ID <bcp14>MUST</bcp14> | ||||
| wrap around to zero once it reaches 0xFFFFFFFF, a | ||||
| misordered new request and a misordered retry | ||||
| cannot be distinguished. Thus, the replier <bcp14>MUST</bcp14> | ||||
| return NFS4ERR_SEQ_MISORDERED (as the result from | ||||
| SEQUENCE or CB_SEQUENCE). | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Unlike the XID, the slot ID is always within a specific | ||||
| range; this has two implications. The first | ||||
| implication is that for a given session, the replier | ||||
| need only cache the results of a limited number of | ||||
| COMPOUND requests. | ||||
| The second implication derives | ||||
| from the first, which is that unlike XID-indexed reply | ||||
| caches (also known as duplicate request caches - DRCs), | ||||
| the slot ID-based reply cache cannot be overflowed. | ||||
| Through use of the sequence ID to identify | ||||
| retransmitted requests, the replier does not need to | ||||
| actually cache the request itself, reducing the | ||||
| storage requirements of the reply cache further. These | ||||
| facilities make it practical to maintain all the | ||||
| required entries for an effective reply cache. | ||||
| </t> | ||||
| <t> | ||||
| The slot ID, sequence ID, and session ID therefore take over the traditional role | ||||
| of the XID and source network address in the replier's | ||||
| reply cache implementation. | ||||
| This approach is considerably | ||||
| more portable and completely robust -- it is not subject to the | ||||
| reassignment of ports as clients reconnect over IP | ||||
| networks. In addition, the RPC XID is not used in the reply cache, | ||||
| enhancing robustness of the cache in the face of any rapid reuse of | ||||
| XIDs by the requester. While the replier does not care | ||||
| about the XID for the purposes of reply cache management | ||||
| (but the replier <bcp14>MUST</bcp14> return the same XID that was in the request), | ||||
| nonetheless there are considerations for the XID in NFSv4.1 | ||||
| that are the same as all other previous versions of NFS. | ||||
| The RPC XID remains in each message and needs to be formulated | ||||
| in NFSv4.1 requests as in any other ONC RPC request. The reasons | ||||
| include: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The RPC layer retains its existing semantics and implementation. | ||||
| </li> | ||||
| <li> | ||||
| The requester and replier must be able to interoperate at the | ||||
| RPC layer, prior to the NFSv4.1 decoding of the SEQUENCE or CB_SEQUENCE | ||||
| operation. | ||||
| </li> | ||||
| <li> | ||||
| If an operation is being used that does not start with | ||||
| SEQUENCE or CB_SEQUENCE (e.g., BIND_CONN_TO_SESSION), | ||||
| then the RPC XID is needed for correct operation to | ||||
| match the reply to the request. | ||||
| </li> | ||||
| <li> | ||||
| The SEQUENCE or CB_SEQUENCE operation may generate an error. | ||||
| If so, the embedded slot ID, sequence ID, and session ID (if | ||||
| present) in the request will not be in the reply, and the | ||||
| requester has only the XID to match the reply to the request. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Given that well-formulated XIDs continue to be required, | ||||
| this raises the question: why do SEQUENCE and CB_SEQUENCE replies | ||||
| have a session ID, slot ID, and sequence ID? Having the session ID | ||||
| in the reply means that the requester does not have to use the | ||||
| XID to look up | ||||
| the session ID, which would be necessary if the connection were | ||||
| associated with multiple sessions. Having the slot ID and sequence ID | ||||
| in the reply means that the requester does not have to use the XID to | ||||
| look up the slot ID and sequence ID. | ||||
| Furthermore, since the XID is only 32 bits, it is too small to | ||||
| guarantee the re-association of a reply with its request | ||||
| <xref target="rpc_xid_issues" format="default"/>; having | ||||
| session ID, slot ID, and sequence ID in the reply allows the | ||||
| client to validate that the reply in fact belongs to the matched request. | ||||
| </t> | ||||
| <t> | ||||
| The SEQUENCE (and CB_SEQUENCE) operation also carries | ||||
| a "highest_slotid" value, which carries additional | ||||
| requester slot usage information. The requester <bcp14>MUST</bcp14> | ||||
| always indicate the slot ID representing the outstanding request with the | ||||
| highest-numbered slot | ||||
| value. | ||||
| The requester should in all cases provide the most | ||||
| conservative value possible, although it can be increased somewhat | ||||
| above the actual instantaneous usage to maintain some minimum or | ||||
| optimal level. This provides a way for the requester to yield unused | ||||
| request slots back to the replier, which in turn can use the | ||||
| information to reallocate resources. | ||||
| </t> | ||||
| <t> | ||||
| The replier | ||||
| responds with both a new target highest_slotid and an | ||||
| enforced highest_slotid, described as follows: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| The target highest_slotid is | ||||
| an indication to the requester of the highest_slotid the replier | ||||
| wishes the requester to be using. This permits the replier to withdraw | ||||
| (or add) resources from a requester that has been found to not be | ||||
| using them, in order to more fairly share resources among a varying | ||||
| level of demand from other requesters. The requester must always comply | ||||
| with the replier's value updates, since they indicate newly | ||||
| established hard limits on the requester's access to session | ||||
| resources. However, because of request pipelining, the requester may | ||||
| have active requests in flight reflecting prior values; therefore, | ||||
| the replier must not immediately require the requester to comply. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The enforced highest_slotid indicates the highest slot ID | ||||
| the requester is permitted to use on a subsequent SEQUENCE or | ||||
| CB_SEQUENCE operation. The replier's enforced highest_slotid <bcp14>SHOULD</bcp14> | ||||
| be no less than the highest_slotid the requester indicated | ||||
| in the SEQUENCE or CB_SEQUENCE arguments. | ||||
| </t> | ||||
| <t> | ||||
| A requester can be intransigent with respect to lowering its | ||||
| highest_slotid argument to a Sequence operation, i.e. the requester | ||||
| continues to ignore the target highest_slotid in the response to | ||||
| a Sequence operation, and continues to set its highest_slotid | ||||
| argument to be higher than the target highest_slotid. This can | ||||
| be considered particularly egregious behavior when the replier | ||||
| knows there are no outstanding requests with slot IDs higher than | ||||
| its target highest_slotid. When faced with such intransigence, | ||||
| the replier is free to take more forceful action, and <bcp14>MAY</bcp14> reply with | ||||
| a new enforced highest_slotid that is less than its previous | ||||
| enforced highest_slotid. Thereafter, if the requester continues | ||||
| to send requests with a highest_slotid that is greater than | ||||
| the replier's new enforced highest_slotid, the server <bcp14>MAY</bcp14> return | ||||
| NFS4ERR_BAD_HIGH_SLOT, unless the slot ID in the request is greater | ||||
| than the new enforced highest_slotid and the request is a retry. | ||||
| </t> | ||||
| <t> | ||||
| The replier <bcp14>SHOULD</bcp14> retain the slots it wants to retire | ||||
| until | ||||
| the requester sends a request with a highest_slotid less than | ||||
| or equal to the replier's new enforced highest_slotid. | ||||
| </t> | ||||
| <t> | ||||
| The requester can also be intransigent with | ||||
| respect to sending non-retry requests that have a slot ID that | ||||
| exceeds the replier's highest_slotid. | ||||
| Once the replier has forcibly lowered the enforced | ||||
| highest_slotid, the requester is only allowed to | ||||
| send retries on slots that exceed the replier's highest_slotid. | ||||
| If a request is received with a slot ID that is higher than | ||||
| the new enforced highest_slotid, and the sequence ID | ||||
| is one higher than what is in the slot's reply cache, then | ||||
| the server can both retire the slot and return NFS4ERR_BADSLOT | ||||
| (however, the server <bcp14>MUST NOT</bcp14> do one and not the other). | ||||
| The reason it is safe to retire the slot | ||||
| is because by using the next sequence ID, the requester | ||||
| is indicating it has received the previous reply for the | ||||
| slot. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| The requester <bcp14>SHOULD</bcp14> use the lowest available | ||||
| slot when sending a new request. This way, the | ||||
| replier may be able to retire slot entries faster. | ||||
| However, where the replier is actively adjusting | ||||
| its granted highest_slotid, | ||||
| it will not be able | ||||
| to use only the receipt of the slot ID and highest_slotid | ||||
| in the request. Neither the slot ID nor the | ||||
| highest_slotid used in a request may reflect the | ||||
| replier's current idea of the requester's session | ||||
| limit, because the request may have been sent from the | ||||
| requester before the update was received. Therefore, | ||||
| in the downward adjustment case, the replier may have | ||||
| to retain a number of reply cache entries at least as | ||||
| large as the old value of maximum requests | ||||
| outstanding, until it can infer that the requester | ||||
| has seen a reply containing the new granted highest_slotid. | ||||
| The replier can infer that the requester has seen such a | ||||
| reply when it receives a new request with the same | ||||
| slot ID as the request replied to and the next higher | ||||
| sequence ID. | ||||
| </li> | ||||
| </ul> | ||||
| <section anchor="cacheseq" numbered="true" toc="default"> | ||||
| <name>Caching of SEQUENCE and CB_SEQUENCE Replies</name> | ||||
| <t> | ||||
| When a SEQUENCE or CB_SEQUENCE operation is | ||||
| successfully executed, its reply <bcp14>MUST</bcp14> always be | ||||
| cached. Specifically, session ID, sequence ID, | ||||
| and slot ID <bcp14>MUST</bcp14> be cached in the reply cache. | ||||
| The reply from SEQUENCE also includes the highest | ||||
| slot ID, target highest slot ID, and status flags. Instead | ||||
| of caching these values, the server <bcp14>MAY</bcp14> | ||||
| re-compute the values from the current | ||||
| state of the fore channel, session, and/or client | ||||
| ID as appropriate. Similarly, the reply from | ||||
| CB_SEQUENCE includes a highest slot ID and target | ||||
| highest slot ID. The client | ||||
| <bcp14>MAY</bcp14> re-compute the values from the | ||||
| current state of the session as appropriate. | ||||
| </t> | ||||
| <t> | ||||
| Regardless of whether or not a replier is re-computing highest slot ID, | ||||
| target slot ID, and status on replies to retries, the requester | ||||
| <bcp14>MUST NOT</bcp14> assume that the values are being re-computed whenever it | ||||
| receives a reply after a retry is sent, since it has no way | ||||
| of knowing whether the reply it has received was sent by the | ||||
| replier in response to the retry or is a delayed response to | ||||
| the original request. Therefore, it may be the case that | ||||
| highest slot ID, target slot ID, or status bits may reflect | ||||
| the state of affairs when the request was first executed. | ||||
| Although acting based on such delayed information is valid, | ||||
| it may cause the receiver of the reply to do unneeded work. Requesters | ||||
| <bcp14>MAY</bcp14> choose to send additional requests to get the current | ||||
| state of affairs or use the state of affairs reported by | ||||
| subsequent requests, in preference to acting immediately | ||||
| on data that might be out of date. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_sequence" numbered="true" toc="default"> | ||||
| <name>Errors from SEQUENCE and CB_SEQUENCE</name> | ||||
| <t> | ||||
| Any time SEQUENCE or CB_SEQUENCE returns an error, the | ||||
| sequence ID of the slot <bcp14>MUST NOT</bcp14> change. The replier <bcp14>MUST NOT</bcp14> | ||||
| modify the reply cache entry for the slot whenever an error | ||||
| is returned from SEQUENCE or CB_SEQUENCE. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Errors from SEQUENCE and CB_SEQUENCE --> | ||||
| <section anchor="optional_reply_caching" numbered="true" toc="default"> | ||||
| <name>Optional Reply Caching</name> | ||||
| <t> | ||||
| On a per-request basis, the requester can choose to | ||||
| direct the replier to cache the reply to all operations | ||||
| after the first operation (SEQUENCE or CB_SEQUENCE) via | ||||
| the sa_cachethis or csa_cachethis fields of the arguments | ||||
| to SEQUENCE or CB_SEQUENCE. | ||||
| The reason it would not direct the replier to cache | ||||
| the entire reply is that the request is composed of all | ||||
| idempotent operations <xref target="Chet" format="default"/>. | ||||
| Caching the reply may offer little benefit. If | ||||
| the reply is too large (see | ||||
| <xref target="COMPOUND_Sizing_Issues" format="default"/>), | ||||
| it may not be cacheable anyway. Even if the reply to | ||||
| idempotent request is small enough to cache, unnecessarily | ||||
| caching the reply slows down the server and increases | ||||
| RPC latency. | ||||
| </t> | ||||
| <t> | ||||
| Whether or not the requester requests the reply to be cached | ||||
| has no effect on the slot processing. If the | ||||
| result of SEQUENCE or CB_SEQUENCE is NFS4_OK, then | ||||
| the slot's sequence ID <bcp14>MUST</bcp14> be incremented by one. | ||||
| If a requester does not direct the replier to cache | ||||
| the reply, the replier <bcp14>MUST</bcp14> do one of following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The replier can cache the entire original reply. | ||||
| Even though sa_cachethis or csa_cachethis is FALSE, | ||||
| the replier is always free to cache. It may choose | ||||
| this approach in order to simplify implementation. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The replier enters into its reply cache a reply consisting | ||||
| of the original results to the SEQUENCE or CB_SEQUENCE | ||||
| operation, and with the next operation in | ||||
| COMPOUND or CB_COMPOUND having the error NFS4ERR_RETRY_UNCACHED_REP. | ||||
| Thus, if the requester later retries the request, it will | ||||
| get NFS4ERR_RETRY_UNCACHED_REP. | ||||
| If a replier receives a retried Sequence operation where the reply | ||||
| to the COMPOUND or CB_COMPOUND was not cached, then the replier, | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <bcp14>MAY</bcp14> return NFS4ERR_RETRY_UNCACHED_REP | ||||
| in reply to a Sequence operation if the | ||||
| Sequence operation is not the first | ||||
| operation (granted, a requester that | ||||
| does so is in violation of the NFSv4.1 | ||||
| protocol). | ||||
| </li> | ||||
| <li> | ||||
| <bcp14>MUST NOT</bcp14> return | ||||
| NFS4ERR_RETRY_UNCACHED_REP in reply to | ||||
| a Sequence operation if the Sequence | ||||
| operation is the first operation. | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| If the second operation is an illegal operation, or an | ||||
| operation that was legal in a previous minor version of | ||||
| NFSv4 and <bcp14>MUST NOT</bcp14> | ||||
| be supported in the current minor version (e.g., SETCLIENTID), the | ||||
| replier <bcp14>MUST NOT</bcp14> ever return NFS4ERR_RETRY_UNCACHED_REP. | ||||
| Instead the replier <bcp14>MUST</bcp14> return NFS4ERR_OP_ILLEGAL or | ||||
| NFS4ERR_BADXDR or NFS4ERR_NOTSUPP as appropriate. | ||||
| </li> | ||||
| <li> | ||||
| If the second operation can result in another error status, | ||||
| the replier <bcp14>MAY</bcp14> return a status other than NFS4ERR_RETRY_UNCACHED_REP, | ||||
| provided the operation is not executed in such a way that the state | ||||
| of the replier is changed. Examples of such | ||||
| an error status include: NFS4ERR_NOTSUPP returned for an | ||||
| operation that is legal but not <bcp14>REQUIRED</bcp14> in the current | ||||
| minor versions, and thus not supported by the replier; | ||||
| NFS4ERR_SEQUENCE_POS; and NFS4ERR_REQ_TOO_BIG. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The discussion above assumes that the | ||||
| retried request matches the original | ||||
| one. <xref target="false_retry" format="default"/> | ||||
| discusses what the replier might do, and | ||||
| <bcp14>MUST</bcp14> do when original and retried requests do not match. | ||||
| Since the replier may | ||||
| only cache a small amount of the | ||||
| information that would be required to | ||||
| determine whether this is a case of a | ||||
| false retry, the replier may send to the | ||||
| client any of the following responses: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The cached reply to the original request (if the replier has cached | ||||
| it in its entirety and the users of the original request and retry match). | ||||
| </li> | ||||
| <li> | ||||
| A reply that consists only of the Sequence operation with the error | ||||
| NFS4ERR_FALSE_RETRY. | ||||
| </li> | ||||
| <li> | ||||
| A reply consisting of the response to Sequence with the status | ||||
| NFS4_OK, together with the second operation as it appeared in the retried | ||||
| request with an error of NFS4ERR_RETRY_UNCACHED_REP or other error as | ||||
| described above. | ||||
| </li> | ||||
| <li> | ||||
| A reply that consists of the response to Sequence with the status | ||||
| NFS4_OK, together with the second operation as it appeared in the original | ||||
| request with an error of NFS4ERR_RETRY_UNCACHED_REP or other error as | ||||
| described above. | ||||
| </li> | ||||
| </ul> | ||||
| <section anchor="false_retry" numbered="true" toc="default"> | ||||
| <name>False Retry</name> | ||||
| <t> | ||||
| If a requester sent a Sequence operation | ||||
| with a slot ID and sequence ID that are | ||||
| in the reply cache but the replier | ||||
| detected that the retried request is not | ||||
| the same as the original request, | ||||
| including a retry that has different | ||||
| operations or different arguments in the | ||||
| operations from the original and a retry | ||||
| that uses a different principal in the | ||||
| RPC request's credential field that | ||||
| translates to a different user, then this | ||||
| is a false retry. When the replier | ||||
| detects a false retry, it is permitted | ||||
| (but not always obligated) to return | ||||
| NFS4ERR_FALSE_RETRY in response to the | ||||
| Sequence operation when it detects a | ||||
| false retry. | ||||
| </t> | ||||
| <t> | ||||
| Translations of particularly privileged | ||||
| user values to other users due to the | ||||
| lack of appropriately secure credentials, | ||||
| as configured on the replier, should be | ||||
| applied before determining whether the | ||||
| users are the same or different. If the | ||||
| replier determines the users are | ||||
| different between the original request | ||||
| and a retry, then the replier <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_FALSE_RETRY. | ||||
| </t> | ||||
| <t> | ||||
| If an operation of the retry is an | ||||
| illegal operation, or an operation that | ||||
| was legal in a previous minor version of | ||||
| NFSv4 and <bcp14>MUST NOT</bcp14> be supported in the | ||||
| current minor version (e.g., SETCLIENTID), | ||||
| the replier <bcp14>MAY</bcp14> return | ||||
| NFS4ERR_FALSE_RETRY (and <bcp14>MUST</bcp14> do so if | ||||
| the users of the original request and | ||||
| retry differ). Otherwise, the replier <bcp14>MAY</bcp14> return | ||||
| NFS4ERR_OP_ILLEGAL or NFS4ERR_BADXDR or | ||||
| NFS4ERR_NOTSUPP as appropriate. Note | ||||
| that the handling is in contrast for how the | ||||
| replier deals with retries requests with | ||||
| no cached reply. The difference is due to | ||||
| NFS4ERR_FALSE_RETRY being a valid error | ||||
| for only Sequence operations, whereas | ||||
| NFS4ERR_RETRY_UNCACHED_REP is a valid | ||||
| error for all operations except illegal | ||||
| operations and operations that <bcp14>MUST NOT</bcp14> be | ||||
| supported in the current minor version of | ||||
| NFSv4. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] Optional Reply Caching --> | ||||
| </section> | ||||
| <!-- [auth] Slot Identifiers and Server Reply Cache --> | ||||
| <section anchor="Retry_and_Replay" numbered="true" toc="default"> | ||||
| <name>Retry and Replay of Reply</name> | ||||
| <t> | ||||
| A requester <bcp14>MUST NOT</bcp14> retry a request, unless | ||||
| the connection it used to send the request | ||||
| disconnects. The requester can then reconnect | ||||
| and re-send the request, or it can re-send the | ||||
| request over a different connection that is | ||||
| associated with the same session. | ||||
| </t> | ||||
| <t> | ||||
| If the requester is a server wanting to re-send a callback | ||||
| operation over the backchannel of a session, the requester | ||||
| of course cannot reconnect because only the client can | ||||
| associate connections with the backchannel. The | ||||
| server can re-send the request over another connection that | ||||
| is bound to the same session's backchannel. If there is no | ||||
| such connection, the server | ||||
| <bcp14>MUST</bcp14> indicate that the session has no backchannel by setting | ||||
| the SEQ4_STATUS_CB_PATH_DOWN_SESSION flag bit in the response | ||||
| to the next SEQUENCE operation from the client. The client <bcp14>MUST</bcp14> | ||||
| then associate a connection with the session (or destroy | ||||
| the session). | ||||
| </t> | ||||
| <t> | ||||
| Note that it is not fatal for a requester to retry | ||||
| without a disconnect between the request and retry. | ||||
| However, the retry does consume resources, especially | ||||
| with RDMA, where each request, retry or not, consumes | ||||
| a credit. Retries for no reason, especially retries | ||||
| sent shortly after the previous attempt, are a poor | ||||
| use of network bandwidth and defeat the purpose of a | ||||
| transport's inherent congestion control system. | ||||
| </t> | ||||
| <t> | ||||
| A requester <bcp14>MUST</bcp14> wait for a reply to a request before using | ||||
| the slot for another request. If it does not wait for | ||||
| a reply, then the requester does not know what | ||||
| sequence ID to use for the slot on its next request. | ||||
| For example, suppose a requester sends a request with sequence ID | ||||
| 1, and does not wait for the response. The next time it uses | ||||
| the slot, it sends the new request with sequence ID 2. | ||||
| If the replier has not seen the request with sequence ID 1, then | ||||
| the replier is not expecting sequence ID 2, and rejects the | ||||
| requester's new request with NFS4ERR_SEQ_MISORDERED (as the | ||||
| result from SEQUENCE or CB_SEQUENCE). | ||||
| </t> | ||||
| <t> | ||||
| RDMA fabrics do not guarantee that the memory handles | ||||
| (Steering Tags) within each RPC/RDMA "chunk" <xref target="RFC8166" format="default"/> | ||||
| are valid on a scope | ||||
| outside that of a single connection. Therefore, handles used by | ||||
| the direct operations become invalid after connection loss. The | ||||
| server must ensure that any RDMA operations that must be replayed | ||||
| from the reply cache use the newly provided handle(s) from the | ||||
| most recent request. | ||||
| </t> | ||||
| <t> | ||||
| A retry might be sent while the original request is still in | ||||
| progress on the replier. The replier <bcp14>SHOULD</bcp14> deal with the issue | ||||
| by returning NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE | ||||
| operation, but implementations <bcp14>MAY</bcp14> return NFS4ERR_MISORDERED. | ||||
| Since errors from SEQUENCE and CB_SEQUENCE are | ||||
| never recorded in the reply cache, this approach allows the | ||||
| results of the execution of the original request to be | ||||
| properly recorded in the reply cache (assuming that the requester | ||||
| specified the reply to be cached). | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Retry and Replay --> | ||||
| <section anchor="sessions_callback_races" numbered="true" toc="default"> | ||||
| <name>Resolving Server Callback Races</name> | ||||
| <t> | ||||
| It is possible for server callbacks to arrive at the | ||||
| client before the reply from related fore channel | ||||
| operations. For example, a client may have been | ||||
| granted a delegation to a file it has opened, but the | ||||
| reply to the OPEN (informing the client of the | ||||
| granting of the delegation) may be delayed in the | ||||
| network. If a conflicting operation arrives at the | ||||
| server, it will recall the delegation using the | ||||
| backchannel, which may be on a different | ||||
| transport connection, perhaps even a different | ||||
| network, or even a different session associated with | ||||
| the same client ID. | ||||
| </t> | ||||
| <t> | ||||
| The presence of a session between the client and server | ||||
| alleviates this issue. When a session is in place, | ||||
| each client request is uniquely identified by its { | ||||
| session ID, slot ID, sequence ID } triple. By the rules under which | ||||
| slot entries (reply cache entries) are | ||||
| retired, the server has knowledge whether the client | ||||
| has "seen" each of the server's replies. The server | ||||
| can therefore provide sufficient information to the | ||||
| client to allow it to disambiguate between an | ||||
| erroneous or conflicting callback race | ||||
| condition. | ||||
| </t> | ||||
| <t> | ||||
| For each client operation that might result in some | ||||
| sort of server callback, the server <bcp14>SHOULD</bcp14> "remember" | ||||
| the { session ID, slot ID, sequence ID } triple of the client request | ||||
| until the slot ID retirement rules allow the server to | ||||
| determine that the client has, in fact, seen the | ||||
| server's reply. Until the time the { session ID, slot ID, | ||||
| sequence ID } request triple can be retired, any recalls | ||||
| of the associated object <bcp14>MUST</bcp14> carry an array of these | ||||
| referring identifiers (in the CB_SEQUENCE operation's | ||||
| arguments), for the benefit of the client. After this | ||||
| time, it is not necessary for the server to provide | ||||
| this information in related callbacks, since it is | ||||
| certain that a race condition can no longer occur. | ||||
| </t> | ||||
| <t> | ||||
| The CB_SEQUENCE operation that begins each server | ||||
| callback carries a list of "referring" { session ID, slot ID, | ||||
| sequence ID } triples. If the client finds the request | ||||
| corresponding to the referring session ID, slot ID, and sequence ID | ||||
| to be currently outstanding (i.e., the server's reply has | ||||
| not been seen by the client), it can determine that | ||||
| the callback has raced the reply, and act | ||||
| accordingly. If the client does not find the request | ||||
| corresponding to the referring triple to be outstanding (including | ||||
| the case of a session ID referring to a destroyed session), | ||||
| then there is no race with respect to this triple. | ||||
| The server <bcp14>SHOULD</bcp14> limit the referring triples | ||||
| to requests that refer to just those that apply to the objects | ||||
| referred to in | ||||
| the CB_COMPOUND procedure. | ||||
| </t> | ||||
| <t> | ||||
| The client must not simply wait forever for the | ||||
| expected server reply to arrive before responding to the | ||||
| CB_COMPOUND that won the race, | ||||
| because it is possible | ||||
| that it will be delayed indefinitely. The client should | ||||
| assume the likely case that the reply will arrive within | ||||
| the average round-trip time for COMPOUND requests to the | ||||
| server, and wait that period of time. If | ||||
| that period of time | ||||
| expires, it can respond to the CB_COMPOUND with | ||||
| NFS4ERR_DELAY. There are other scenarios under which callbacks | ||||
| may race replies. | ||||
| Among them are pNFS layout recalls as described in | ||||
| <xref target="pnfs_operation_sequencing" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Resolving server callback races with sessions --> | ||||
| <section anchor="COMPOUND_Sizing_Issues" numbered="true" toc="default"> | ||||
| <name>COMPOUND and CB_COMPOUND Construction Issues</name> | ||||
| <t> | ||||
| Very large requests and replies may pose both buffer | ||||
| management issues (especially with RDMA) and reply | ||||
| cache issues. When the session is created | ||||
| (<xref target="OP_CREATE_SESSION" format="default"/>), for each channel (fore and | ||||
| back), the client and server | ||||
| negotiate the maximum-sized request they will | ||||
| send or process (ca_maxrequestsize), the maximum-sized reply | ||||
| they will return or process (ca_maxresponsesize), and the | ||||
| maximum-sized reply they will store in the reply cache | ||||
| (ca_maxresponsesize_cached). | ||||
| </t> | ||||
| <t> | ||||
| If a request exceeds ca_maxrequestsize, the reply will | ||||
| have the status NFS4ERR_REQ_TOO_BIG. A replier <bcp14>MAY</bcp14> | ||||
| return NFS4ERR_REQ_TOO_BIG as the status for the first operation | ||||
| (SEQUENCE or CB_SEQUENCE) in the request (which means that | ||||
| no operations in the request executed and that the | ||||
| state of the slot in the reply cache is unchanged), or it <bcp14>MAY</bcp14> | ||||
| opt to return it on a subsequent operation in the same | ||||
| COMPOUND or CB_COMPOUND request (which means that at least one | ||||
| operation did execute and that the state of the slot in the reply cache does | ||||
| change). The replier <bcp14>SHOULD</bcp14> set NFS4ERR_REQ_TOO_BIG on the | ||||
| operation that exceeds ca_maxrequestsize. | ||||
| </t> | ||||
| <t> | ||||
| If a reply exceeds ca_maxresponsesize, the reply will | ||||
| have the status NFS4ERR_REP_TOO_BIG. A replier <bcp14>MAY</bcp14> | ||||
| return NFS4ERR_REP_TOO_BIG as the status for the first operation | ||||
| (SEQUENCE or CB_SEQUENCE) in the request, or it <bcp14>MAY</bcp14> | ||||
| opt to return it on a subsequent operation (in the same | ||||
| COMPOUND or CB_COMPOUND reply). A replier <bcp14>MAY</bcp14> return NFS4ERR_REP_TOO_BIG | ||||
| in the reply to SEQUENCE or CB_SEQUENCE, even if the response | ||||
| would still exceed ca_maxresponsesize. | ||||
| </t> | ||||
| <t> | ||||
| If sa_cachethis or csa_cachethis is TRUE, then the | ||||
| replier <bcp14>MUST</bcp14> cache a reply except if an error is | ||||
| returned by the SEQUENCE or CB_SEQUENCE operation (see | ||||
| <xref target="err_sequence" format="default"/>). If the reply exceeds | ||||
| ca_maxresponsesize_cached (and sa_cachethis or | ||||
| csa_cachethis is TRUE), then the server <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE. Even if | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for | ||||
| that matter) is returned on an operation other than the | ||||
| first operation (SEQUENCE or CB_SEQUENCE), then | ||||
| the reply <bcp14>MUST</bcp14> be cached if sa_cachethis or | ||||
| csa_cachethis is TRUE. | ||||
| For example, if a COMPOUND has eleven | ||||
| operations, including SEQUENCE, the fifth operation is | ||||
| a RENAME, and the tenth operation is a READ for one | ||||
| million bytes, the server may return | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth operation. | ||||
| Since the server executed several operations, especially | ||||
| the non-idempotent RENAME, the client's request to | ||||
| cache the reply needs to be honored in order for the | ||||
| correct operation of exactly once semantics. If the | ||||
| client retries the request, the server will have cached | ||||
| a reply that contains results for ten of the eleven requested | ||||
| operations, with | ||||
| the tenth operation having a status of NFS4ERR_REP_TOO_BIG_TO_CACHE. | ||||
| </t> | ||||
| <t> | ||||
| A client needs to take care that, when sending | ||||
| operations that change the current filehandle (except for | ||||
| PUTFH, PUTPUBFH, PUTROOTFH, and RESTOREFH), it | ||||
| does not exceed the maximum reply buffer before the GETFH | ||||
| operation. Otherwise, the client will have to retry | ||||
| the operation that changed the current filehandle, in order | ||||
| to obtain the desired filehandle. | ||||
| For the OPEN operation (see <xref target="OP_OPEN" format="default"/>), | ||||
| retry is not always available as an option. | ||||
| The following guidelines for the handling of | ||||
| filehandle-changing operations are advised: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Within the same COMPOUND procedure, a client | ||||
| <bcp14>SHOULD</bcp14> send GETFH immediately after a current | ||||
| filehandle-changing operation. A client | ||||
| <bcp14>MUST</bcp14> send GETFH after a current filehandle-changing operation | ||||
| that is also non-idempotent (e.g., the OPEN operation), unless | ||||
| the operation is RESTOREFH. RESTOREFH is | ||||
| an exception, because even though it is | ||||
| non-idempotent, the filehandle RESTOREFH | ||||
| produced originated from an operation that | ||||
| is either idempotent (e.g., PUTFH, LOOKUP), | ||||
| or non-idempotent (e.g., OPEN, CREATE). If the | ||||
| origin is non-idempotent, then because the client | ||||
| <bcp14>MUST</bcp14> send GETFH after the origin operation, the | ||||
| client can recover if RESTOREFH returns an error. | ||||
| </li> | ||||
| <li> | ||||
| A server <bcp14>MAY</bcp14> return NFS4ERR_REP_TOO_BIG or | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) | ||||
| on a filehandle-changing operation if the reply would | ||||
| be too large on the next operation. | ||||
| </li> | ||||
| <li> | ||||
| A server <bcp14>SHOULD</bcp14> return NFS4ERR_REP_TOO_BIG or | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) | ||||
| on a filehandle-changing, non-idempotent operation if the reply would | ||||
| be too large on the next operation, especially if the operation | ||||
| is OPEN. | ||||
| </li> | ||||
| <li> | ||||
| A server <bcp14>MAY</bcp14> return NFS4ERR_UNSAFE_COMPOUND to a non-idempotent | ||||
| current filehandle-changing operation, if | ||||
| it looks at the next operation (in the same COMPOUND procedure) | ||||
| and finds it is | ||||
| not GETFH. The server <bcp14>SHOULD</bcp14> do this if it is unable to | ||||
| determine in advance whether the total response size | ||||
| would exceed ca_maxresponsesize_cached or ca_maxresponsesize. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <!-- [auth] COMPOUND and CB_COMPOUND Construction Issues --> | ||||
| <section anchor="Persistence" numbered="true" toc="default"> | ||||
| <name>Persistence</name> | ||||
| <t> | ||||
| Since the reply cache is bounded, it is practical for | ||||
| the reply cache to persist across server restarts. | ||||
| The replier <bcp14>MUST</bcp14> persist the following information | ||||
| if it agreed to persist the session (when the session | ||||
| was created; see <xref target="OP_CREATE_SESSION" format="default"/>): | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The session ID. | ||||
| </li> | ||||
| <li> | ||||
| The slot table including the sequence ID and cached reply for | ||||
| each slot. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The above are sufficient for a replier to provide EOS semantics | ||||
| for any requests that were sent and executed before the server | ||||
| restarted. | ||||
| If the replier is a client, then there is no need for | ||||
| it to persist any more information, unless the client will | ||||
| be persisting all other state across client restart, in which case, | ||||
| the server will never see any NFSv4.1-level protocol manifestation | ||||
| of a client restart. | ||||
| If the replier is a server, with just the | ||||
| slot table and session ID persisting, | ||||
| any requests the client retries after the server restart will | ||||
| return the results that are cached in the reply cache, | ||||
| and any new requests (i.e., the sequence ID is one greater than the | ||||
| slot's sequence ID) <bcp14>MUST</bcp14> be rejected with NFS4ERR_DEADSESSION | ||||
| (returned by SEQUENCE). Such a session is considered dead. | ||||
| A server <bcp14>MAY</bcp14> re-animate a session | ||||
| after a server restart so that the session will accept new | ||||
| requests as well as retries. To re-animate a session, | ||||
| the server needs to persist additional information | ||||
| through server restart: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The client ID. This is a prerequisite to let the client | ||||
| create more sessions associated with the same client ID | ||||
| as the re-animated session. | ||||
| </li> | ||||
| <li> | ||||
| The client ID's sequence ID that is used for creating | ||||
| sessions (see Sections <xref target="OP_EXCHANGE_ID" format="counter"/> and | ||||
| <xref target="OP_CREATE_SESSION" format="counter"/>). This is a | ||||
| prerequisite to let the client create more sessions. | ||||
| </li> | ||||
| <li> | ||||
| The principal that created the client ID. This | ||||
| allows the server to authenticate the client when | ||||
| it sends EXCHANGE_ID. | ||||
| </li> | ||||
| <li> | ||||
| The SSV, if SP4_SSV state protection was | ||||
| specified when the client ID was created (see <xref target="OP_EXCHANGE_ID" format="default"/>). This lets the | ||||
| client create new sessions, and associate connections | ||||
| with the new and existing sessions. | ||||
| </li> | ||||
| <li> | ||||
| The properties of the client ID as defined in | ||||
| <xref target="OP_EXCHANGE_ID" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| A persistent reply cache places certain demands on the server. | ||||
| The execution of the sequence of operations (starting with SEQUENCE) | ||||
| and placement of its results in the persistent cache <bcp14>MUST</bcp14> be atomic. If | ||||
| a client retries a sequence of operations that was previously | ||||
| executed on the server, the only acceptable outcomes are either | ||||
| the original cached reply or an indication that the client ID | ||||
| or session has been lost (indicating a catastrophic loss | ||||
| of the reply cache or a session that has been deleted because | ||||
| the client failed to use the session for an extended period | ||||
| of time). | ||||
| </t> | ||||
| <t> | ||||
| A server could fail and restart in the middle of a | ||||
| COMPOUND procedure that contains one or more non-idempotent | ||||
| or idempotent-but-modifying operations. This creates | ||||
| an even higher challenge for atomic execution and | ||||
| placement of results in the reply cache. One way | ||||
| to view the problem is as a single transaction consisting of | ||||
| each operation in the COMPOUND followed by storing | ||||
| the result in persistent storage, then finally a transaction | ||||
| commit. If there is a failure before the transaction | ||||
| is committed, then the server rolls back the transaction. | ||||
| If the server itself fails, then when it restarts, its | ||||
| recovery logic could roll back the transaction | ||||
| before starting the NFSv4.1 server. | ||||
| </t> | ||||
| <t> | ||||
| While the description of the | ||||
| implementation for atomic execution of the request | ||||
| and caching of the reply | ||||
| is beyond the scope of this document, an example implementation | ||||
| for NFSv2 <xref target="RFC1094" format="default"/> is described in <xref target="ha_nfs_ibm" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Persistence --> | ||||
| </section> | ||||
| <!-- [auth] Exactly Once Semantics --> | ||||
| <section anchor="RDMA_Considerations" numbered="true" toc="default"> | ||||
| <name>RDMA Considerations</name> | ||||
| <t> | ||||
| A complete discussion of the operation of RPC-based | ||||
| protocols over RDMA transports is in <xref target="RFC8166" format="default"/>. A | ||||
| discussion of the operation of NFSv4, including NFSv4.1, | ||||
| over RDMA is in <xref target="RFC8267" format="default"/>. Where RDMA is considered, | ||||
| this specification assumes the use of such a layering; | ||||
| it addresses only the upper-layer issues relevant to | ||||
| making best use of RPC/RDMA. | ||||
| </t> | ||||
| <section anchor="RDMA_Connection_Resources" numbered="true" toc="default"> | ||||
| <name>RDMA Connection Resources</name> | ||||
| <t> | ||||
| RDMA requires its consumers to register memory and post | ||||
| buffers of a specific size and number for receive | ||||
| operations. | ||||
| </t> | ||||
| <t> | ||||
| Registration of memory can be a relatively high-overhead operation, | ||||
| since it requires pinning of buffers, assignment of attributes | ||||
| (e.g., readable/writable), and initialization of hardware | ||||
| translation. Preregistration is desirable to reduce overhead. | ||||
| These registrations are specific to hardware interfaces and even to | ||||
| RDMA connection endpoints; therefore, negotiation of their limits is | ||||
| desirable to manage resources effectively. | ||||
| </t> | ||||
| <t> | ||||
| Following basic registration, these buffers must be posted by | ||||
| the RPC layer to handle receives. These buffers remain in use by | ||||
| the RPC/NFSv4.1 implementation; the size and number of them must be | ||||
| known to the remote peer in order to avoid RDMA errors that would | ||||
| cause a fatal error on the RDMA connection. | ||||
| </t> | ||||
| <t> | ||||
| NFSv4.1 manages slots as resources on a per-session | ||||
| basis (see <xref target="Session" format="default"/>), while RDMA | ||||
| connections manage credits on a per-connection basis. | ||||
| This means that in order for a peer to send data over | ||||
| RDMA to a remote buffer, it has to have both an NFSv4.1 | ||||
| slot and an RDMA credit. If multiple RDMA connections | ||||
| are associated with a session, then if the total number | ||||
| of credits across all RDMA connections associated with | ||||
| the session is X, and the number of slots in the session | ||||
| is Y, then the maximum number of outstanding requests | ||||
| is the lesser of X and Y. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] RDMA Connection Resources --> | ||||
| <section anchor="Flow_Control" numbered="true" toc="default"> | ||||
| <name>Flow Control</name> | ||||
| <t> | ||||
| Previous versions of NFS do not provide flow control; | ||||
| instead, they rely on the windowing provided by | ||||
| transports like TCP to throttle requests. This does | ||||
| not work with RDMA, which provides no operation flow | ||||
| control and will terminate a connection in error when | ||||
| limits are exceeded. | ||||
| Limits such as maximum number of requests | ||||
| outstanding are therefore negotiated when a session | ||||
| is created (see the ca_maxrequests field in <xref target="OP_CREATE_SESSION" format="default"/>). These limits then | ||||
| provide the maxima within which each connection associated | ||||
| with the session's channel(s) must remain. | ||||
| RDMA connections are managed within these limits as | ||||
| described in <xref target="RFC8166" sectionFormat="of" section="3.3"/>; if there are multiple | ||||
| RDMA connections, then the maximum number of requests | ||||
| for a channel will be divided among the RDMA | ||||
| connections. Put a different way, the onus is on the | ||||
| replier to ensure that the total number of RDMA credits | ||||
| across all connections associated with the replier's | ||||
| channel does exceed the channel's maximum number of | ||||
| outstanding requests. | ||||
| </t> | ||||
| <t> | ||||
| The limits may also be modified | ||||
| dynamically at the replier's choosing by manipulating | ||||
| certain parameters present in each NFSv4.1 reply. In | ||||
| addition, the CB_RECALL_SLOT callback operation (see | ||||
| <xref target="OP_CB_RECALL_SLOT" format="default"/>) can be sent by | ||||
| a server to a client to return RDMA credits to the | ||||
| server, thereby lowering the maximum number of requests | ||||
| a client can have outstanding to the server. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Flow Control --> | ||||
| <section anchor="Padding" numbered="true" toc="default"> | ||||
| <name>Padding</name> | ||||
| <t> | ||||
| Header padding is requested by each peer at session initiation | ||||
| (see the ca_headerpadsize argument to CREATE_SESSION in | ||||
| <xref target="OP_CREATE_SESSION" format="default"/>), and | ||||
| subsequently used by the RPC RDMA layer, as described in <xref target="RFC8166" format="default"/>. | ||||
| Zero padding is permitted. | ||||
| </t> | ||||
| <t> | ||||
| Padding leverages the useful property | ||||
| that RDMA preserve alignment of data, even when they are | ||||
| placed into anonymous (untagged) buffers. If requested, client | ||||
| inline writes will insert appropriate pad bytes within the request | ||||
| header to align the data payload on the specified boundary. The | ||||
| client is encouraged to add sufficient padding (up to the | ||||
| negotiated size) so that | ||||
| the "data" field of the WRITE operation | ||||
| is aligned. | ||||
| Most servers can make good use of such padding, | ||||
| which allows them to chain receive buffers in such a way that any | ||||
| data carried by client requests will be placed into appropriate | ||||
| buffers at the server, ready for file system processing. The | ||||
| receiver's RPC layer encounters no overhead from skipping over pad | ||||
| bytes, and the RDMA layer's high performance makes the insertion | ||||
| and transmission of padding on the sender a significant | ||||
| optimization. In this way, the need for servers to perform RDMA | ||||
| Read to satisfy all but the largest client writes is obviated. An | ||||
| added benefit is the reduction of message round trips on the network | ||||
| -- a potentially good trade, where latency is present. | ||||
| </t> | ||||
| <t> | ||||
| The value to choose for padding is subject to a number of criteria. | ||||
| A primary source of variable-length data in the RPC header is the | ||||
| authentication information, the form of which is client-determined, | ||||
| possibly in response to server specification. The contents of | ||||
| COMPOUNDs, sizes of strings such as those passed to RENAME, etc. all | ||||
| go into the determination of a maximal NFSv4.1 request size and | ||||
| therefore minimal buffer size. The client must select its offered | ||||
| value carefully, so as to avoid overburdening the server, and vice | ||||
| versa. The benefit of an appropriate padding value is higher | ||||
| performance. | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| Sender gather: | ||||
| |RPC Request|Pad bytes|Length| -> |User data...| | ||||
| \------+----------------------/ \ | ||||
| \ \ | ||||
| \ Receiver scatter: \-----------+- ... | ||||
| /-----+----------------\ \ \ | ||||
| |RPC Request|Pad|Length| -> |FS buffer|->|FS buffer|->... | ||||
| ]]></artwork> | ||||
| <t> | ||||
| In the above case, the server may recycle unused buffers to the | ||||
| next posted receive if unused by the actual received request, or | ||||
| may pass the now-complete buffers by reference for normal write | ||||
| processing. For a server that can make use of it, this removes | ||||
| any need for data copies of incoming data, without resorting to | ||||
| complicated end-to-end buffer advertisement and management. This | ||||
| includes most kernel-based and integrated server designs, among | ||||
| many others. The client may perform similar optimizations, if | ||||
| desired. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Padding --> | ||||
| <section anchor="dual" numbered="true" toc="default"> | ||||
| <name>Dual RDMA and Non-RDMA Transports</name> | ||||
| <t> | ||||
| Some RDMA transports (e.g., RFC 5040 <xref target="RFC5040" format="default"/>) | ||||
| permit a "streaming" (non-RDMA) phase, | ||||
| where ordinary traffic might flow before "stepping up" | ||||
| to RDMA mode, commencing RDMA traffic. Some RDMA | ||||
| transports start connections always in RDMA mode. | ||||
| NFSv4.1 allows, but does not assume, a streaming phase | ||||
| before RDMA mode. When a connection | ||||
| is associated with a session, the client and server negotiate whether the | ||||
| connection is used in RDMA or non-RDMA mode (see Sections | ||||
| <xref target="OP_CREATE_SESSION" format="counter"/> and | ||||
| <xref target="OP_BIND_CONN_TO_SESSION" format="counter"/>). | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] RDMA Transports --> | ||||
| </section> | ||||
| <!-- [auth] RDMA Considerations --> | ||||
| <section anchor="Sessions_Security" numbered="true" toc="default"> | ||||
| <name>Session Security</name> | ||||
| <section anchor="Session_Callback_Security" numbered="true" toc="default"> | ||||
| <name>Session Callback Security</name> | ||||
| <t> | ||||
| Via session/connection association, NFSv4.1 improves security over | ||||
| that provided by NFSv4.0 for the backchannel. The | ||||
| connection is client-initiated (see | ||||
| <xref target="OP_BIND_CONN_TO_SESSION" format="default"/>) and subject to the same | ||||
| firewall and routing checks as the fore channel. | ||||
| At the client's option (see <xref target="OP_EXCHANGE_ID" format="default"/>), | ||||
| connection association is fully authenticated before being | ||||
| activated (see <xref target="OP_BIND_CONN_TO_SESSION" format="default"/>). | ||||
| Traffic from the server over the | ||||
| backchannel is authenticated exactly as the client specifies | ||||
| (see <xref target="Backchannel_RPC_Security" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Session Callback Security --> | ||||
| <section anchor="Backchannel_RPC_Security" numbered="true" toc="default"> | ||||
| <name>Backchannel RPC Security</name> | ||||
| <t> | ||||
| When the NFSv4.1 client establishes the backchannel, it | ||||
| informs the server of the security flavors and principals | ||||
| to use when sending requests. If the security flavor is | ||||
| RPCSEC_GSS, the client expresses the principal in the form | ||||
| of an established RPCSEC_GSS context. The server is free | ||||
| to use any of the flavor/principal combinations the client | ||||
| offers, but it <bcp14>MUST NOT</bcp14> use unoffered combinations. | ||||
| This way, the client need not provide a target | ||||
| GSS principal for the backchannel as it did with | ||||
| NFSv4.0, nor does the server have to implement an | ||||
| RPCSEC_GSS initiator as it did with NFSv4.0 <xref target="RFC3530" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| The CREATE_SESSION (<xref target="OP_CREATE_SESSION" format="default"/>) | ||||
| and BACKCHANNEL_CTL (<xref target="OP_BACKCHANNEL_CTL" format="default"/>) | ||||
| operations allow the client to specify flavor/principal combinations. | ||||
| </t> | ||||
| <t> | ||||
| Also note that the SP4_SSV state protection mode | ||||
| (see Sections <xref target="OP_EXCHANGE_ID" format="counter"/> and <xref target="protect_state_change" format="counter"/>) has the side | ||||
| benefit of providing SSV-derived RPCSEC_GSS contexts (<xref target="ssv_mech" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Backchannel RPC Security --> | ||||
| <section anchor="protect_state_change" numbered="true" toc="default"> | ||||
| <name>Protection from Unauthorized State Changes</name> | ||||
| <t> | ||||
| As described to this point in the specification, the state model | ||||
| of NFSv4.1 is vulnerable to an attacker that | ||||
| sends a SEQUENCE operation with a forged session ID and with a slot ID that | ||||
| it expects the legitimate client to use next. When the legitimate client | ||||
| uses the slot ID with the same sequence number, the server | ||||
| returns the attacker's result from the reply cache, which | ||||
| disrupts the legitimate client and thus denies service to it. | ||||
| Similarly, an attacker could send a CREATE_SESSION with a forged | ||||
| client ID to create a new session associated with the client ID. | ||||
| The attacker could send requests using the new session that | ||||
| change locking state, such as LOCKU operations to release locks | ||||
| the legitimate client has acquired. Setting a security | ||||
| policy on the file that requires RPCSEC_GSS credentials when | ||||
| manipulating the file's state is one potential work around, | ||||
| but has the disadvantage of preventing a legitimate client from | ||||
| releasing state when RPCSEC_GSS is required to do so, but | ||||
| a GSS context cannot be obtained (possibly because the user | ||||
| has logged off the client). | ||||
| </t> | ||||
| <t> | ||||
| NFSv4.1 provides three options to a client for state protection, | ||||
| which are specified when a client creates | ||||
| a client ID via EXCHANGE_ID (<xref target="OP_EXCHANGE_ID" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| The first (SP4_NONE) is to simply waive state protection. | ||||
| </t> | ||||
| <t> | ||||
| The other two options (SP4_MACH_CRED and SP4_SSV) | ||||
| share several traits: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| An RPCSEC_GSS-based credential is used to authenticate | ||||
| client ID and session maintenance operations, | ||||
| including creating and destroying a session, | ||||
| associating a connection with the session, and | ||||
| destroying the client ID. | ||||
| </li> | ||||
| <li> | ||||
| Because RPCSEC_GSS is used to authenticate | ||||
| client ID and session maintenance, the attacker cannot | ||||
| associate a rogue connection with a legitimate session, or | ||||
| associate a rogue session with a legitimate client ID in | ||||
| order to maliciously alter the client ID's lock state | ||||
| via CLOSE, LOCKU, DELEGRETURN, LAYOUTRETURN, etc. | ||||
| </li> | ||||
| <li> | ||||
| In cases where the server's security policies on a | ||||
| portion of its namespace require RPCSEC_GSS authentication, | ||||
| a client may have to use an RPCSEC_GSS credential | ||||
| to remove per-file state (e.g., LOCKU, CLOSE, etc.). | ||||
| The server may require that the principal that removes | ||||
| the state match certain criteria (e.g., | ||||
| the principal might have to be the same as the one | ||||
| that acquired the state). However, the client might | ||||
| not have an RPCSEC_GSS context for such a principal, | ||||
| and might not be able to create such a context (perhaps | ||||
| because the user has logged off). When the client | ||||
| establishes SP4_MACH_CRED or SP4_SSV protection, | ||||
| it can specify a list of operations that the server <bcp14>MUST</bcp14> | ||||
| allow using the machine credential (if SP4_MACH_CRED | ||||
| is used) or the SSV credential (if SP4_SSV is used). | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The SP4_MACH_CRED state protection option uses a machine | ||||
| credential where the principal that | ||||
| creates the client ID <bcp14>MUST</bcp14> also be the principal | ||||
| that performs client ID and session maintenance | ||||
| operations. | ||||
| The security of the machine credential state protection approach | ||||
| depends entirely on safeguarding the per-machine credential. | ||||
| Assuming a proper safeguard using the per-machine credential | ||||
| for operations like CREATE_SESSION, BIND_CONN_TO_SESSION, | ||||
| DESTROY_SESSION, and DESTROY_CLIENTID will prevent an attacker | ||||
| from associating a rogue connection with a session, or | ||||
| associating a rogue session with a client ID. | ||||
| </t> | ||||
| <t> | ||||
| There are at least three scenarios for the SP4_MACH_CRED | ||||
| option: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| The system administrator configures a unique, | ||||
| permanent per-machine credential for one of the | ||||
| mandated GSS mechanisms (e.g., if Kerberos | ||||
| V5 is used, a "keytab" containing a principal derived from a | ||||
| client host name could be used). | ||||
| </li> | ||||
| <li> | ||||
| The client is used by a single user, and so the | ||||
| client ID and its sessions are used by just that | ||||
| user. If the user's credential expires, then session | ||||
| and client ID maintenance cannot occur, but since | ||||
| the client has a single user, only that user is | ||||
| inconvenienced. | ||||
| </li> | ||||
| <li> | ||||
| The physical client has multiple users, but the | ||||
| client implementation has a unique client ID for | ||||
| each user. This is effectively the same as the | ||||
| second scenario, but a disadvantage is that each | ||||
| user needs to be allocated at least one session each, | ||||
| so the approach suffers from lack of economy. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| The SP4_SSV protection option uses the SSV (<xref target="intro_definitions" format="default"/>), via RPCSEC_GSS and the SSV GSS | ||||
| mechanism (<xref target="ssv_mech" format="default"/>), to protect state from attack. | ||||
| The SP4_SSV protection option is intended for the situation | ||||
| comprised of a client that has multiple active users and a system | ||||
| administrator who wants to avoid the burden of installing a permanent | ||||
| machine credential on each client. The SSV is | ||||
| established and updated on the server via SET_SSV (see <xref target="OP_SET_SSV" format="default"/>). To prevent eavesdropping, | ||||
| a client <bcp14>SHOULD</bcp14> send SET_SSV via RPCSEC_GSS with | ||||
| the privacy service. Several aspects of the SSV | ||||
| make it intractable for an attacker to guess the SSV, | ||||
| and thus associate rogue connections with a session, | ||||
| and rogue sessions with a client ID: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The arguments to and results of SET_SSV include digests of the old and | ||||
| new SSV, respectively. | ||||
| </li> | ||||
| <li> | ||||
| Because the initial value of the SSV is zero, | ||||
| therefore known, the client that opts for SP4_SSV | ||||
| protection and opts to apply SP4_SSV protection to | ||||
| BIND_CONN_TO_SESSION and CREATE_SESSION <bcp14>MUST</bcp14> send | ||||
| at least one SET_SSV operation before the first | ||||
| BIND_CONN_TO_SESSION operation or before the second | ||||
| CREATE_SESSION operation on a client ID. If it does | ||||
| not, the SSV mechanism will not generate tokens | ||||
| (<xref target="ssv_mech" format="default"/>). | ||||
| A client <bcp14>SHOULD</bcp14> send SET_SSV as soon as a session | ||||
| is created. | ||||
| </li> | ||||
| <li> | ||||
| A SET_SSV request does not replace the SSV with the argument to | ||||
| SET_SSV. Instead, the current SSV on the server is logically | ||||
| exclusive ORed (XORed) with the argument to SET_SSV. | ||||
| Each time a new principal uses a client ID for the first | ||||
| time, the client | ||||
| <bcp14>SHOULD</bcp14> send a SET_SSV with that principal's RPCSEC_GSS | ||||
| credentials, with RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Here are the types of attacks that can be attempted by an attacker named | ||||
| Eve on a victim named Bob, and how SP4_SSV protection foils | ||||
| each attack: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| Suppose Eve is the first user to log into a | ||||
| legitimate client. Eve's use of an NFSv4.1 | ||||
| file system will cause the legitimate client to | ||||
| create a client ID | ||||
| with SP4_SSV protection, specifying that the BIND_CONN_TO_SESSION | ||||
| operation <bcp14>MUST</bcp14> use the SSV credential. Eve's use of | ||||
| the file system also causes an SSV to be created. The | ||||
| SET_SSV operation that creates the SSV will be protected by | ||||
| the RPCSEC_GSS context created by the legitimate | ||||
| client, which uses Eve's GSS principal and | ||||
| credentials. Eve can eavesdrop on the network while | ||||
| her RPCSEC_GSS context is created and the SET_SSV | ||||
| using her context is sent. Even if the legitimate | ||||
| client sends the SET_SSV with RPC_GSS_SVC_PRIVACY, | ||||
| because Eve knows her own credentials, she can | ||||
| decrypt the SSV. Eve can compute an RPCSEC_GSS | ||||
| credential that BIND_CONN_TO_SESSION will accept, | ||||
| and so associate a new connection with the | ||||
| legitimate session. Eve can change the slot ID and | ||||
| sequence state of a legitimate session, and/or the | ||||
| SSV state, in such a way that when Bob accesses | ||||
| the server via the same legitimate client, the | ||||
| legitimate client will be unable to use the session. | ||||
| </t> | ||||
| <t> | ||||
| The client's only recourse is to create a new client | ||||
| ID for Bob to use, and establish a new SSV for the | ||||
| client ID. The client will be unable to delete | ||||
| the old client ID, and will let the lease on the old | ||||
| client ID expire. | ||||
| </t> | ||||
| <t> | ||||
| Once the legitimate client establishes an SSV over | ||||
| the new session using Bob's RPCSEC_GSS context, | ||||
| Eve can use the new session via the legitimate | ||||
| client, but she cannot disrupt Bob. Moreover, | ||||
| because the client <bcp14>SHOULD</bcp14> have modified the SSV | ||||
| due to Eve using the new session, Bob cannot get | ||||
| revenge on Eve by associating a rogue connection | ||||
| with the session. | ||||
| </t> | ||||
| <t> | ||||
| The question is how did the legitimate client detect | ||||
| that Eve has hijacked the old session? When the | ||||
| client detects that a new principal, Bob, wants to | ||||
| use the session, it <bcp14>SHOULD</bcp14> have sent a SET_SSV, | ||||
| which leads to the following sub-scenarios: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| Let us suppose that from the rogue connection, Eve | ||||
| sent a SET_SSV with the same slot ID and sequence ID that | ||||
| the legitimate client later uses. The server will | ||||
| assume the SET_SSV sent with Bob's credentials is a retry, | ||||
| and return to the legitimate | ||||
| client the reply it sent Eve. However, unless Eve can | ||||
| correctly guess the SSV the legitimate client will use, | ||||
| the digest verification checks in the SET_SSV response | ||||
| will fail. That is an indication to the client that the | ||||
| session has apparently been hijacked. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Alternatively, Eve sent a SET_SSV with a different slot ID than | ||||
| the legitimate client uses for its SET_SSV. Then the digest | ||||
| verification of the SET_SSV sent with Bob's credentials fails | ||||
| on the server, and the error returned to the client makes it | ||||
| apparent that the session has been hijacked. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Alternatively, Eve sent an operation other than SET_SSV, | ||||
| but with the same slot ID and sequence that the legitimate client | ||||
| uses for its SET_SSV. The server returns to the legitimate | ||||
| client the response it sent Eve. The client sees that the | ||||
| response is not at all what it expects. The client | ||||
| assumes either session hijacking or a server bug, and either way | ||||
| destroys the old session. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Eve associates a rogue connection with the session | ||||
| as above, and then destroys the session. Again, Bob | ||||
| goes to use the server from the legitimate client, | ||||
| which sends a SET_SSV using Bob's credentials. The client receives an error | ||||
| that indicates that the session does not exist. When | ||||
| the client tries to create a new session, this | ||||
| will fail because the SSV it has does not match that which the | ||||
| server has, and now the client knows the session | ||||
| was hijacked. The legitimate client establishes a | ||||
| new client ID. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| If Eve creates a connection before the legitimate | ||||
| client establishes an SSV, because the initial | ||||
| value of the SSV is zero and therefore known, | ||||
| Eve can send a SET_SSV that will pass the digest | ||||
| verification check. However, because the new | ||||
| connection has not been associated with the session, | ||||
| the SET_SSV is rejected for that reason. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| In summary, an attacker's disruption of state when | ||||
| SP4_SSV protection is in use is limited to the | ||||
| formative period of a client ID, its first session, | ||||
| and the establishment of the SSV. Once a non-malicious | ||||
| user uses the client ID, the client quickly detects | ||||
| any hijack and rectifies the situation. Once a | ||||
| non-malicious user successfully modifies the SSV, | ||||
| the attacker cannot use NFSv4.1 operations to disrupt | ||||
| the non-malicious user. | ||||
| </t> | ||||
| <t> | ||||
| Note that neither the SP4_MACH_CRED nor | ||||
| SP4_SSV protection approaches prevent hijacking | ||||
| of a transport connection that has previously been | ||||
| associated with a session. If the goal of a counter-threat | ||||
| strategy is to prevent connection hijacking, the use of IPsec is <bcp14>RECOMMENDED</bcp14>. | ||||
| </t> | ||||
| <t> | ||||
| If a connection hijack occurs, the hijacker could in | ||||
| theory change locking state and negatively impact the | ||||
| service to legitimate clients. However, if the server | ||||
| is configured to require the use of RPCSEC_GSS with | ||||
| integrity or privacy on the affected file objects, and | ||||
| if EXCHGID4_FLAG_BIND_PRINC_STATEID capability (<xref target="OP_EXCHANGE_ID" format="default"/>) is in force, this will | ||||
| thwart unauthorized attempts to change locking state. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Protection from Unauthorized State Changes --> | ||||
| </section> | ||||
| <!-- [auth] Sessions Security --> | ||||
| <section anchor="ssv_mech" numbered="true" toc="default"> | ||||
| <name>The Secret State Verifier (SSV) GSS Mechanism</name> | ||||
| <t> | ||||
| The SSV provides the secret key for a GSS mechanism internal to NFSv4.1 | ||||
| that NFSv4.1 uses for state protection. Contexts for this | ||||
| mechanism are not established via the RPCSEC_GSS | ||||
| protocol. Instead, the contexts are automatically | ||||
| created when EXCHANGE_ID specifies | ||||
| SP4_SSV protection. The only tokens | ||||
| defined are the PerMsgToken (emitted by GSS_GetMIC) | ||||
| and the SealedMessage token (emitted by GSS_Wrap). | ||||
| </t> | ||||
| <t> | ||||
| The mechanism OID for the SSV mechanism is | ||||
| iso.org.dod.internet.private.enterprise.Michael | ||||
| Eisler.nfs.ssv_mech (1.3.6.1.4.1.28882.1.1). While the | ||||
| SSV mechanism does not define any initial context | ||||
| tokens, the OID can be used to let servers indicate | ||||
| that the SSV mechanism is acceptable whenever the | ||||
| client sends a SECINFO or SECINFO_NO_NAME operation | ||||
| (see | ||||
| <xref target="Security_Service_Negotiation" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| The SSV mechanism defines four subkeys derived from | ||||
| the SSV value. Each time SET_SSV is invoked, the subkeys | ||||
| are recalculated by the client and server. The | ||||
| calculation of each of the four subkeys depends on each | ||||
| of the four respective ssv_subkey4 enumerated values. The calculation | ||||
| uses the HMAC | ||||
| <xref target="RFC2104" format="default"/> algorithm, using the current SSV as the key, the one-way hash | ||||
| algorithm as negotiated by EXCHANGE_ID, | ||||
| and the input text as represented by the XDR encoded | ||||
| enumeration value for that subkey of data type ssv_subkey4. | ||||
| If the length of the output of the HMAC algorithm exceeds the length of | ||||
| key of the encryption algorithm (which is also negotiated by EXCHANGE_ID), | ||||
| then the subkey <bcp14>MUST</bcp14> be truncated from the HMAC output, i.e., if the | ||||
| subkey is of N bytes long, then the first N bytes of the HMAC output | ||||
| <bcp14>MUST</bcp14> be used for the subkey. The specification of EXCHANGE_ID | ||||
| states that the length of the output of the HMAC algorithm <bcp14>MUST NOT</bcp14> | ||||
| be less than the length of subkey needed for the encryption algorithm | ||||
| (see <xref target="OP_EXCHANGE_ID" format="default"/>). | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* Input for computing subkeys */ | ||||
| enum ssv_subkey4 { | ||||
| SSV4_SUBKEY_MIC_I2T = 1, | ||||
| SSV4_SUBKEY_MIC_T2I = 2, | ||||
| SSV4_SUBKEY_SEAL_I2T = 3, | ||||
| SSV4_SUBKEY_SEAL_T2I = 4 | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The subkey derived from SSV4_SUBKEY_MIC_I2T | ||||
| is used for calculating message integrity codes (MICs) | ||||
| that originate from the NFSv4.1 client, whether as part | ||||
| of a request over the fore channel or a response | ||||
| over the backchannel. The subkey derived from | ||||
| SSV4_SUBKEY_MIC_T2I is used for MICs originating from the | ||||
| NFSv4.1 server. The subkey derived from SSV4_SUBKEY_SEAL_I2T | ||||
| is used for encryption text originating from the NFSv4.1 | ||||
| client, and the subkey derived from SSV4_SUBKEY_SEAL_T2I | ||||
| is used for encryption text originating from the | ||||
| NFSv4.1 server. | ||||
| </t> | ||||
| <t> | ||||
| The PerMsgToken description is based on an XDR definition: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* Input for computing smt_hmac */ | ||||
| struct ssv_mic_plain_tkn4 { | ||||
| uint32_t smpt_ssv_seq; | ||||
| opaque smpt_orig_plain<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* SSV GSS PerMsgToken token */ | ||||
| struct ssv_mic_tkn4 { | ||||
| uint32_t smt_ssv_seq; | ||||
| opaque smt_hmac<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The field smt_hmac is an HMAC calculated by using the | ||||
| subkey derived from SSV4_SUBKEY_MIC_I2T or | ||||
| SSV4_SUBKEY_MIC_T2I as the key, the one-way hash algorithm | ||||
| as negotiated by EXCHANGE_ID, and the input text | ||||
| as represented by data of type ssv_mic_plain_tkn4. | ||||
| The field smpt_ssv_seq is the same as smt_ssv_seq. | ||||
| The field smpt_orig_plain is the "message" input passed | ||||
| to GSS_GetMIC() (see <xref target="RFC2743" sectionFormat="of" section="2.3.1"/>). | ||||
| The caller of GSS_GetMIC() provides a pointer to a buffer | ||||
| containing the plain text. The SSV mechanism's entry point for | ||||
| GSS_GetMIC() encodes this into an opaque array, and the encoding | ||||
| will include an initial four-byte length, plus any necessary padding. | ||||
| Prepended to this will be the XDR encoded value of smpt_ssv_seq, | ||||
| thus making up an XDR encoding of a value of data type | ||||
| ssv_mic_plain_tkn4, which in turn is the input into the HMAC. | ||||
| </t> | ||||
| <t> | ||||
| The token emitted by GSS_GetMIC() is XDR encoded and | ||||
| of XDR data type ssv_mic_tkn4. The field smt_ssv_seq | ||||
| comes from the SSV sequence number, which is equal to | ||||
| one after SET_SSV (<xref target="OP_SET_SSV" format="default"/>) | ||||
| is called the first time on a client | ||||
| ID. | ||||
| Thereafter, the SSV sequence number is incremented on each SET_SSV. | ||||
| Thus, smt_ssv_seq represents the version of the SSV at | ||||
| the time GSS_GetMIC() was called. As noted in <xref target="OP_EXCHANGE_ID" format="default"/>, the client and server | ||||
| can maintain multiple concurrent versions of the SSV. | ||||
| This allows the SSV to be changed without serializing | ||||
| all RPC calls that use the SSV mechanism with SET_SSV | ||||
| operations. | ||||
| Once the HMAC is calculated, it is XDR encoded into | ||||
| smt_hmac, which will include an initial four-byte length, | ||||
| and any necessary padding. Prepended to this will be | ||||
| the XDR encoded value of smt_ssv_seq. | ||||
| </t> | ||||
| <t> | ||||
| The SealedMessage description is based on an XDR definition: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* Input for computing ssct_encr_data and ssct_hmac */ | ||||
| struct ssv_seal_plain_tkn4 { | ||||
| opaque sspt_confounder<>; | ||||
| uint32_t sspt_ssv_seq; | ||||
| opaque sspt_orig_plain<>; | ||||
| opaque sspt_pad<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* SSV GSS SealedMessage token */ | ||||
| struct ssv_seal_cipher_tkn4 { | ||||
| uint32_t ssct_ssv_seq; | ||||
| opaque ssct_iv<>; | ||||
| opaque ssct_encr_data<>; | ||||
| opaque ssct_hmac<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The token emitted by GSS_Wrap() is XDR encoded and | ||||
| of XDR data type ssv_seal_cipher_tkn4. | ||||
| </t> | ||||
| <t> | ||||
| The ssct_ssv_seq field has the same meaning as smt_ssv_seq. | ||||
| </t> | ||||
| <t> | ||||
| The ssct_encr_data field is the result of encrypting a | ||||
| value of the XDR encoded data type ssv_seal_plain_tkn4. | ||||
| The encryption key is the subkey derived from SSV4_SUBKEY_SEAL_I2T | ||||
| or SSV4_SUBKEY_SEAL_T2I, and the encryption | ||||
| algorithm is that negotiated by EXCHANGE_ID. | ||||
| </t> | ||||
| <t> | ||||
| The ssct_iv field is the initialization vector (IV) | ||||
| for the encryption algorithm (if applicable) and is | ||||
| sent in clear text. The content and size of the IV <bcp14>MUST</bcp14> | ||||
| comply with the specification of the encryption algorithm. | ||||
| For example, the id-aes256-CBC algorithm <bcp14>MUST</bcp14> use | ||||
| a 16-byte initialization vector (IV), which <bcp14>MUST</bcp14> be | ||||
| unpredictable for each instance of a value of data type | ||||
| ssv_seal_plain_tkn4 that is encrypted with a particular | ||||
| SSV key. | ||||
| </t> | ||||
| <t> | ||||
| The ssct_hmac field is the result of computing an HMAC using the value | ||||
| of the XDR encoded data type ssv_seal_plain_tkn4 as the input | ||||
| text. The key is the subkey derived from SSV4_SUBKEY_MIC_I2T or | ||||
| SSV4_SUBKEY_MIC_T2I, and the one-way hash algorithm is that | ||||
| negotiated by EXCHANGE_ID. | ||||
| </t> | ||||
| <t> | ||||
| The sspt_confounder field is a random value. | ||||
| </t> | ||||
| <t> | ||||
| The sspt_ssv_seq field is the same as ssvt_ssv_seq. | ||||
| </t> | ||||
| <t> | ||||
| The field sspt_orig_plain field is the original plaintext | ||||
| and is the "input_message" input passed to | ||||
| GSS_Wrap() (see <xref target="RFC2743" sectionFormat="of" section="2.3.3"/>). | ||||
| As with the handling of the plaintext by the SSV mechanism's | ||||
| GSS_GetMIC() entry point, the entry point for GSS_Wrap() | ||||
| expects a pointer to the plaintext, and will XDR encode | ||||
| an opaque array into sspt_orig_plain | ||||
| representing the plain text, along with | ||||
| the other fields of an instance of data type ssv_seal_plain_tkn4. | ||||
| </t> | ||||
| <t> | ||||
| The sspt_pad field is present to support encryption | ||||
| algorithms that require inputs to be in fixed-sized | ||||
| blocks. The content of sspt_pad is zero filled | ||||
| except for the length. Beware that the XDR encoding | ||||
| of ssv_seal_plain_tkn4 contains three variable-length | ||||
| arrays, and so each array consumes four bytes for an | ||||
| array length, and each array that follows the length | ||||
| is always padded to a multiple of four bytes per the | ||||
| XDR standard. | ||||
| </t> | ||||
| <t> | ||||
| For example, suppose the encryption algorithm uses 16-byte blocks, and | ||||
| the sspt_confounder is three bytes long, and | ||||
| the sspt_orig_plain field is 15 bytes long. | ||||
| The XDR encoding of sspt_confounder uses eight bytes | ||||
| (4 + 3 + 1-byte pad), | ||||
| the XDR encoding of sspt_ssv_seq uses four bytes, | ||||
| the XDR encoding of sspt_orig_plain uses 20 bytes | ||||
| (4 + 15 + 1-byte pad), | ||||
| and the smallest XDR encoding of the sspt_pad field | ||||
| is four bytes. | ||||
| This totals 36 bytes. The next multiple of 16 is 48; | ||||
| thus, the length field of sspt_pad needs to be set to | ||||
| 12 bytes, or a total encoding of 16 bytes. | ||||
| The total number of XDR encoded bytes is thus 8 + | ||||
| 4 + 20 + 16 = 48. | ||||
| </t> | ||||
| <t> | ||||
| GSS_Wrap() emits a token that is an XDR | ||||
| encoding of a value of data type ssv_seal_cipher_tkn4. | ||||
| Note that regardless of whether or not the caller of GSS_Wrap() | ||||
| requests confidentiality, the token always has | ||||
| confidentiality. This is because the SSV mechanism | ||||
| is for RPCSEC_GSS, and RPCSEC_GSS never produces | ||||
| GSS_wrap() tokens without confidentiality. | ||||
| </t> | ||||
| <t> | ||||
| There is one SSV per client ID. | ||||
| There is a single GSS context for | ||||
| a client ID / SSV pair. | ||||
| All SSV mechanism RPCSEC_GSS handles of a client ID / SSV pair | ||||
| share the same GSS context. | ||||
| SSV GSS contexts do not expire except when the SSV | ||||
| is destroyed (causes would include the client ID | ||||
| being destroyed or a server restart). | ||||
| Since one | ||||
| purpose of context expiration is to replace keys that | ||||
| have been in use for "too long", hence vulnerable to | ||||
| compromise by brute force or accident, the client can | ||||
| replace the SSV key by | ||||
| sending periodic SET_SSV operations, which is done by cycling through | ||||
| different users' RPCSEC_GSS credentials. This way, the SSV is | ||||
| replaced without destroying the SSV's GSS contexts. | ||||
| </t> | ||||
| <t> | ||||
| SSV RPCSEC_GSS handles can be expired or deleted by the server | ||||
| at any time, and the EXCHANGE_ID operation can be used to create | ||||
| more SSV RPCSEC_GSS handles. Expiration of SSV RPCSEC_GSS handles | ||||
| does not imply that the SSV or its GSS context has expired. | ||||
| </t> | ||||
| <t> | ||||
| The client <bcp14>MUST</bcp14> establish an SSV via SET_SSV before the | ||||
| SSV GSS context can be used to emit tokens from GSS_Wrap() | ||||
| and GSS_GetMIC(). If SET_SSV has not been successfully | ||||
| called, attempts to emit tokens <bcp14>MUST</bcp14> fail. | ||||
| </t> | ||||
| <t> | ||||
| The SSV mechanism does not support replay detection and sequencing | ||||
| in its tokens because RPCSEC_GSS does not use those features (see | ||||
| "Context Creation Requests", <xref target="RFC2203" sectionFormat="of" section="5.2.2"/>). However, <xref target="rpcsec_ssv_consider" format="default"/> discusses special | ||||
| considerations for the SSV mechanism when used with RPCSEC_GSS. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] The SSV GSS Mechanism --> | ||||
| <section anchor="rpcsec_ssv_consider" numbered="true" toc="default"> | ||||
| <name>Security Considerations for RPCSEC_GSS When Using the SSV Mechanism</name> | ||||
| <t> | ||||
| When a client ID is created with SP4_SSV state protection (see <xref target="OP_EXCHANGE_ID" format="default"/>), the client is permitted to associate | ||||
| multiple RPCSEC_GSS handles with the single SSV GSS context | ||||
| (see <xref target="ssv_mech" format="default"/>). Because of the way RPCSEC_GSS | ||||
| (both version 1 and version 2, see <xref target="RFC2203" format="default"/> and | ||||
| <xref target="RFC5403" format="default"/>) calculate the verifier of the reply, | ||||
| special care must be taken by the implementation of the NFSv4.1 | ||||
| client to prevent attacks by a man-in-the-middle. The verifier | ||||
| of an RPCSEC_GSS reply is the output of GSS_GetMIC() applied to | ||||
| the input value of the seq_num field of the RPCSEC_GSS credential | ||||
| (data type rpc_gss_cred_ver_1_t) (see <xref target="RFC2203" sectionFormat="of" section="5.3.3.2"/>). If multiple RPCSEC_GSS handles share | ||||
| the same | ||||
| GSS context, then if one handle is used to send a request with the | ||||
| same seq_num value as another handle, an attacker could block the | ||||
| reply, and replace it with the verifier used for the other handle. | ||||
| </t> | ||||
| <t> | ||||
| There are multiple ways to prevent the attack on the SSV RPCSEC_GSS | ||||
| verifier in the reply. The simplest is believed to be as follows. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Each time one or more new SSV RPCSEC_GSS handles are created via | ||||
| EXCHANGE_ID, the client <bcp14>SHOULD</bcp14> send a SET_SSV operation to modify | ||||
| the SSV. By changing the SSV, the new handles will not result in the | ||||
| re-use of an SSV RPCSEC_GSS verifier in a reply. | ||||
| </li> | ||||
| <li> | ||||
| When a requester decides to use N SSV RPCSEC_GSS handles, it <bcp14>SHOULD</bcp14> | ||||
| assign a unique and non-overlapping range of seq_nums to each SSV | ||||
| RPCSEC_GSS handle. The size of each range <bcp14>SHOULD</bcp14> be equal to MAXSEQ | ||||
| / N (see <xref target="RFC2203" sectionFormat="of" section="5"/> for the definition | ||||
| of MAXSEQ). When an SSV RPCSEC_GSS handle reaches its maximum, it | ||||
| <bcp14>SHOULD</bcp14> force the replier to destroy the handle by sending a NULL | ||||
| RPC request with seq_num set to MAXSEQ + 1 (see | ||||
| <xref target="RFC2203" sectionFormat="of" section="5.3.3.3"/>). | ||||
| </li> | ||||
| <li> | ||||
| When the requester wants to increase or decrease N, it <bcp14>SHOULD</bcp14> force | ||||
| the replier to destroy all N handles by sending a NULL RPC request on | ||||
| each handle with seq_num set to MAXSEQ + 1. If the requester is the | ||||
| client, it <bcp14>SHOULD</bcp14> send a SET_SSV operation before using new handles. | ||||
| If the requester is the server, then the client <bcp14>SHOULD</bcp14> send a SET_SSV | ||||
| operation when it detects that the server has forced it to destroy a | ||||
| backchannel's SSV RPCSEC_GSS handle. By sending a SET_SSV operation, | ||||
| the SSV will change, and so the attacker will be unavailable to | ||||
| successfully replay a previous verifier in a reply to the requester. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Note that if the replier carefully creates the SSV RPCSEC_GSS | ||||
| handles, the related risk of a man-in-the-middle splicing a forged | ||||
| SSV RPCSEC_GSS credential with a verifier for another handle does | ||||
| not exist. This is because the verifier in an RPCSEC_GSS request | ||||
| is computed from input that includes both the RPCSEC_GSS handle and | ||||
| seq_num (see <xref target="RFC2203" sectionFormat="of" section="5.3.1"/>). Provided the | ||||
| replier takes care to avoid re-using the value of an RPCSEC_GSS | ||||
| handle that it creates, such as by including a generation number in the | ||||
| handle, the man-in-the-middle will not be able to successfully replay | ||||
| a previous verifier in the request to a replier. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="Session_Mechanics_Steady_State" numbered="true" toc="default"> | ||||
| <name>Session Mechanics - Steady State</name> | ||||
| <section anchor="Obligations_of_the_Server" numbered="true" toc="default"> | ||||
| <name>Obligations of the Server</name> | ||||
| <t> | ||||
| The server has the primary obligation to monitor the | ||||
| state of backchannel resources that the client has | ||||
| created for the server (RPCSEC_GSS contexts and backchannel | ||||
| connections). If these resources vanish, the | ||||
| server takes action as specified in <xref target="Events_Requiring_Server_Action" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Obligations of the Server --> | ||||
| <section anchor="Obligations_of_the_Client" numbered="true" toc="default"> | ||||
| <name>Obligations of the Client</name> | ||||
| <t> | ||||
| The client <bcp14>SHOULD</bcp14> honor the following obligations in order to | ||||
| utilize the session: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Keep a necessary session from going idle on the server. A client | ||||
| that requires a session but nonetheless is not | ||||
| sending operations risks having the session be destroyed | ||||
| by the server. This is because sessions consume | ||||
| resources, and resource limitations may force the | ||||
| server to cull an inactive session. A server <bcp14>MAY</bcp14> consider | ||||
| a session to be inactive if the client has not used | ||||
| the session before the session inactivity timer (<xref target="session_inactive" format="default"/>) has expired. | ||||
| </li> | ||||
| <li> | ||||
| Destroy the session when not needed. If a client has | ||||
| multiple sessions, one of which has no | ||||
| requests waiting for replies, and has been idle for | ||||
| some period of time, it <bcp14>SHOULD</bcp14> destroy the session. | ||||
| </li> | ||||
| <li> | ||||
| Maintain GSS contexts and RPCSEC_GSS handles | ||||
| for the backchannel. If the client | ||||
| requires the server to use the RPCSEC_GSS security | ||||
| flavor for callbacks, then it needs to be sure the | ||||
| RPCSEC_GSS handles and/or their GSS | ||||
| contexts that are handed to the server via BACKCHANNEL_CTL or | ||||
| CREATE_SESSION are unexpired. | ||||
| </li> | ||||
| <li> | ||||
| Preserve a connection for a backchannel. The server | ||||
| requires a backchannel in order to gracefully recall | ||||
| recallable state or notify the client of certain | ||||
| events. Note that if the connection is not being used | ||||
| for the fore channel, there is no way for the client to tell | ||||
| if the connection is still alive (e.g., the server | ||||
| restarted without sending a disconnect). The onus is | ||||
| on the server, not the client, to determine if the | ||||
| backchannel's connection is alive, and to indicate in | ||||
| the response to a SEQUENCE operation when the last | ||||
| connection associated with a session's backchannel | ||||
| has disconnected. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <!-- [auth] Obligations of the Client --> | ||||
| <section anchor="Steps_the_Client_Takes_To_Establish_a_Session" numbered="true" toc="default"> | ||||
| <name>Steps the Client Takes to Establish a Session</name> | ||||
| <t> | ||||
| If the client does not have a client ID, the client | ||||
| sends EXCHANGE_ID to establish a client ID. If it | ||||
| opts for SP4_MACH_CRED or SP4_SSV protection, in the | ||||
| spo_must_enforce list of operations, it <bcp14>SHOULD</bcp14> at | ||||
| minimum specify CREATE_SESSION, DESTROY_SESSION, | ||||
| BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. | ||||
| If it opts for SP4_SSV protection, the client needs to | ||||
| ask for SSV-based RPCSEC_GSS handles. | ||||
| </t> | ||||
| <t> | ||||
| The client uses the client ID to send a | ||||
| CREATE_SESSION on a connection to the server. | ||||
| The results of CREATE_SESSION indicate whether or not the | ||||
| server will persist the session reply cache through | ||||
| a server that has restarted, and the client notes this | ||||
| for future reference. | ||||
| </t> | ||||
| <t> | ||||
| If the client specified SP4_SSV state protection | ||||
| when the client ID was created, then it <bcp14>SHOULD</bcp14> send | ||||
| SET_SSV in the first COMPOUND after the session is | ||||
| created. Each time a new principal goes to use the | ||||
| client ID, it <bcp14>SHOULD</bcp14> send a SET_SSV again. | ||||
| </t> | ||||
| <t> | ||||
| If the client wants to use delegations, layouts, | ||||
| directory notifications, or any other state that | ||||
| requires a backchannel, then it needs to add a connection | ||||
| to the backchannel if CREATE_SESSION did not already | ||||
| do so. The client creates a connection, and calls | ||||
| BIND_CONN_TO_SESSION to associate the connection | ||||
| with the session and the session's backchannel. If | ||||
| CREATE_SESSION did not already do so, the client <bcp14>MUST</bcp14> | ||||
| tell the server what security is required in order | ||||
| for the client to accept callbacks. The client does | ||||
| this via BACKCHANNEL_CTL. If the client selected | ||||
| SP4_MACH_CRED or SP4_SSV protection when it called | ||||
| EXCHANGE_ID, then the client <bcp14>SHOULD</bcp14> specify that the | ||||
| backchannel use RPCSEC_GSS contexts for security. | ||||
| </t> | ||||
| <t> | ||||
| If the client wants to use additional | ||||
| connections for the backchannel, then it needs to call | ||||
| BIND_CONN_TO_SESSION on each connection it wants to | ||||
| use with the session. If the client wants to use | ||||
| additional connections for the fore channel, then | ||||
| it needs to call BIND_CONN_TO_SESSION if it specified | ||||
| SP4_SSV or SP4_MACH_CRED state protection when the | ||||
| client ID was created. | ||||
| </t> | ||||
| <t> | ||||
| At this point, the session has reached steady state. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Steps the Client Takes To Establish a Session --> | ||||
| </section> | ||||
| <!-- [auth] Session Mechanics - Steady State --> | ||||
| <section anchor="session_inactive" numbered="true" toc="default"> | ||||
| <name>Session Inactivity Timer</name> | ||||
| <t> | ||||
| The server <bcp14>MAY</bcp14> maintain a session inactivity timer for | ||||
| each session. If the session inactivity timer expires, | ||||
| then the server <bcp14>MAY</bcp14> destroy the session. To avoid losing | ||||
| a session due to inactivity, the client <bcp14>MUST</bcp14> renew | ||||
| the session inactivity timer. The length of session | ||||
| inactivity timer <bcp14>MUST NOT</bcp14> be less than the lease_time | ||||
| attribute (<xref target="attrdef_lease_time" format="default"/>). | ||||
| As with lease renewal (<xref target="lease_renewal" format="default"/>), | ||||
| when the server receives a SEQUENCE operation, | ||||
| it resets the session inactivity timer, and <bcp14>MUST NOT</bcp14> allow the | ||||
| timer to expire while the rest of the operations in the | ||||
| COMPOUND procedure's request are still executing. Once the | ||||
| last operation has finished, the server <bcp14>MUST</bcp14> set the session | ||||
| inactivity timer to expire no sooner than the sum of the | ||||
| current time and the value of the lease_time attribute. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="Session_Mechanics_Recovery" numbered="true" toc="default"> | ||||
| <name>Session Mechanics - Recovery</name> | ||||
| <section anchor="Events_Requiring_Client_Action" numbered="true" toc="default"> | ||||
| <name>Events Requiring Client Action</name> | ||||
| <t> | ||||
| The following events require client action to recover. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>RPCSEC_GSS Context Loss by Callback Path</name> | ||||
| <t> | ||||
| If all RPCSEC_GSS handles | ||||
| granted by the client to the server for callback use have | ||||
| expired, the client <bcp14>MUST</bcp14> | ||||
| establish a new handle via BACKCHANNEL_CTL. The | ||||
| sr_status_flags field of the SEQUENCE results indicates when callback handles | ||||
| are nearly expired, or fully expired (see <xref target="OP_SEQUENCE_DESCRIPTION" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] RPCSEC_GSS Context Loss by Callback_Path --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Connection Loss</name> | ||||
| <t> | ||||
| If the client loses the last connection of the session | ||||
| and wants to retain the session, then it needs to | ||||
| create a new connection, and if, when the client | ||||
| ID was created, BIND_CONN_TO_SESSION was specified | ||||
| in the spo_must_enforce list, the client <bcp14>MUST</bcp14> use | ||||
| BIND_CONN_TO_SESSION to associate the connection with | ||||
| the session. | ||||
| </t> | ||||
| <t> | ||||
| If there was a request outstanding at the time | ||||
| of connection loss, then if the client wants to continue | ||||
| to use the session, it <bcp14>MUST</bcp14> retry the request, as | ||||
| described in | ||||
| <xref target="Retry_and_Replay" format="default"/>. Note that it | ||||
| is not necessary to retry requests over a connection | ||||
| with the same source network address or the same | ||||
| destination network address as the lost connection. As | ||||
| long as the session ID, slot ID, and sequence ID in the | ||||
| retry match that of the original request, the server | ||||
| will recognize the request as a retry if it executed | ||||
| the request prior to disconnect. | ||||
| </t> | ||||
| <t> | ||||
| If the connection that was lost was the last one associated with | ||||
| the backchannel, and the client wants to retain the backchannel and/or | ||||
| prevent revocation of recallable state, the client needs to | ||||
| reconnect, and if it does, it | ||||
| <bcp14>MUST</bcp14> associate the connection to the session and backchannel via | ||||
| BIND_CONN_TO_SESSION. | ||||
| The server <bcp14>SHOULD</bcp14> indicate when it has no callback connection | ||||
| via the sr_status_flags result from SEQUENCE. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Connection Disconnect --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Backchannel GSS Context Loss</name> | ||||
| <t> | ||||
| Via the sr_status_flags result of the SEQUENCE operation or | ||||
| other means, the client will learn if some or all of | ||||
| the RPCSEC_GSS contexts it assigned to the backchannel have | ||||
| been lost. If the client wants to retain the backchannel and/or | ||||
| not put recallable state subject to revocation, | ||||
| the client needs to use BACKCHANNEL_CTL to | ||||
| assign new contexts. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Backchannel GSS Context Loss --> | ||||
| <section anchor="loss_of_session" numbered="true" toc="default"> | ||||
| <name>Loss of Session</name> | ||||
| <t> | ||||
| The replier might lose a record of the session. Causes include: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Replier failure and restart. | ||||
| </li> | ||||
| <li> | ||||
| A catastrophe that causes the reply cache to be corrupted or | ||||
| lost on the media on which it was stored. This applies | ||||
| even if the replier indicated in the CREATE_SESSION results | ||||
| that it would persist the cache. | ||||
| </li> | ||||
| <li> | ||||
| The server purges the session of a client that has been | ||||
| inactive for a very extended period of time. | ||||
| </li> | ||||
| <li> | ||||
| As a result of configuration changes among a set of clustered | ||||
| servers, a network address previously connected to one | ||||
| server becomes connected to a different server that has | ||||
| no knowledge of the session in question. Such a configuration | ||||
| change will generally only happen when the original server | ||||
| ceases to function for a time. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Loss of reply cache is equivalent to loss of session. | ||||
| The replier indicates loss of session to the requester | ||||
| by returning NFS4ERR_BADSESSION on the next operation | ||||
| that uses the session ID that refers to the lost | ||||
| session. | ||||
| </t> | ||||
| <t> | ||||
| After an event like a server restart, the client may have | ||||
| lost its connections. The client assumes for the moment | ||||
| that the session has not been lost. It reconnects, and | ||||
| if it specified connection association enforcement when | ||||
| the session was created, it | ||||
| invokes BIND_CONN_TO_SESSION using the session ID. Otherwise, | ||||
| it invokes SEQUENCE. If | ||||
| BIND_CONN_TO_SESSION or SEQUENCE returns NFS4ERR_BADSESSION, the | ||||
| client knows the session is not available to it when communicating | ||||
| with that network address. If the connection survives | ||||
| session loss, then the next SEQUENCE operation the client | ||||
| sends over the connection will get back NFS4ERR_BADSESSION. | ||||
| The client again knows the session was lost. | ||||
| </t> | ||||
| <t> | ||||
| Here is one suggested algorithm for the client when it gets | ||||
| NFS4ERR_BADSESSION. It is not obligatory in that, if a | ||||
| client does not want to take advantage of such features as | ||||
| trunking, it may omit parts of it. However, it is a useful | ||||
| example that draws attention to various possible recovery | ||||
| issues: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| If the client has other connections to | ||||
| other server network addresses | ||||
| associated with the same session, attempt | ||||
| a COMPOUND with a single operation, SEQUENCE, | ||||
| on each of the other connections. | ||||
| </li> | ||||
| <li> | ||||
| If the attempts succeed, the session is still alive, | ||||
| and this is a strong indicator that the server's | ||||
| network address has moved. | ||||
| The client might send an EXCHANGE_ID on the | ||||
| connection that returned NFS4ERR_BADSESSION | ||||
| to see if there are opportunities for client ID | ||||
| trunking (i.e., the same client ID and so_major_id value | ||||
| are | ||||
| returned). The client might use DNS to see if | ||||
| the moved network address was replaced with another, | ||||
| so that the performance and availability benefits of | ||||
| session trunking can continue. | ||||
| </li> | ||||
| <li> | ||||
| If the SEQUENCE requests fail with NFS4ERR_BADSESSION, | ||||
| then the session no longer exists on any of the | ||||
| server network addresses for which the client has connections | ||||
| associated with that session ID. It is possible the | ||||
| session is still alive and available on other | ||||
| network addresses. The client sends an EXCHANGE_ID | ||||
| on all the connections to see if the server owner | ||||
| is still listening on those network addresses. | ||||
| If the same server owner is returned but a new | ||||
| client ID is returned, this is a strong | ||||
| indicator of a server restart. If both the same | ||||
| server owner and same client ID are | ||||
| returned, then this is a strong indication | ||||
| that the server did delete the session, and the | ||||
| client will need to send a CREATE_SESSION if it | ||||
| has no other sessions for that client ID. | ||||
| If a different server owner is returned, | ||||
| the client can use DNS to find | ||||
| other network addresses. If it does not, or if | ||||
| DNS does not find any other addresses for the server, | ||||
| then the client will be unable to provide NFSv4.1 | ||||
| service, and fatal errors should be returned | ||||
| to processes that were using the server. If the | ||||
| client is using a "mount" paradigm, unmounting | ||||
| the server is advised. | ||||
| </li> | ||||
| <li> | ||||
| If the client knows of no other connections associated | ||||
| with the session ID and server network addresses that | ||||
| are, or have been, associated with the session ID, | ||||
| then the client can use DNS to find | ||||
| other network addresses. If it does not, or if | ||||
| DNS does not find any other addresses for the server, | ||||
| then the client will be unable to provide NFSv4.1 | ||||
| service, and fatal errors should be returned | ||||
| to processes that were using the server. If the | ||||
| client is using a "mount" paradigm, unmounting | ||||
| the server is advised. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| If there is a reconfiguration event that results in the | ||||
| same network address being assigned to servers where the | ||||
| eir_server_scope value is different, it cannot be guaranteed | ||||
| that a session ID generated by the first will be recognized | ||||
| as invalid by the first. Therefore, in managing server | ||||
| reconfigurations among servers with different server scope | ||||
| values, it is necessary to make sure that all clients have | ||||
| disconnected from the first server before effecting | ||||
| the reconfiguration. Nonetheless, clients should not | ||||
| assume that servers will always adhere to this requirement; | ||||
| clients <bcp14>MUST</bcp14> be prepared to deal with unexpected | ||||
| effects of server reconfigurations. | ||||
| Even where a session ID is inappropriately | ||||
| recognized as valid, it is likely either that the connection | ||||
| will not be recognized as valid or that a sequence value | ||||
| for a slot will not be correct. Therefore, when a client | ||||
| receives results indicating such unexpected errors, the use of | ||||
| EXCHANGE_ID to determine the current server configuration | ||||
| is <bcp14>RECOMMENDED</bcp14>. | ||||
| </t> | ||||
| <t> | ||||
| A variation on the above is that after a server's network | ||||
| address moves, there is no NFSv4.1 server listening, e.g., no | ||||
| listener on port 2049. In this example, one of the following occur: the NFSv4 server returns | ||||
| NFS4ERR_MINOR_VERS_MISMATCH, the NFS server returns a | ||||
| PROG_MISMATCH error, the RPC listener on 2049 returns | ||||
| PROG_UNVAIL, or attempts to reconnect to the network address | ||||
| timeout. These <bcp14>SHOULD</bcp14> be treated as equivalent to SEQUENCE | ||||
| returning NFS4ERR_BADSESSION for these purposes. | ||||
| </t> | ||||
| <t> | ||||
| When the client detects session loss, it needs to call CREATE_SESSION | ||||
| to recover. Any non-idempotent operations that were in progress | ||||
| might have been performed on the server at the time of | ||||
| session loss. The client has no general way to recover from this. | ||||
| </t> | ||||
| <t> | ||||
| Note that loss of session does not imply loss of byte-range lock, open, delegation, | ||||
| or layout state because locks, opens, delegations, and layouts | ||||
| are tied to the client ID and depend on the client ID, not the session. | ||||
| Nor does loss of byte-range lock, open, delegation, | ||||
| or layout state imply loss of session state, because the session depends | ||||
| on the client ID; loss of client ID however does imply loss of | ||||
| session, byte-range lock, open, delegation, and layout state. | ||||
| See <xref target="server_failure" format="default"/>. | ||||
| A session can survive a server restart, | ||||
| but lock recovery may still be needed. | ||||
| </t> | ||||
| <t> | ||||
| It is possible that CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID | ||||
| (e.g., the server restarts and does not preserve client ID | ||||
| state). | ||||
| If so, the client needs to call EXCHANGE_ID, followed by | ||||
| CREATE_SESSION. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Loss of Session --> | ||||
| </section> | ||||
| <!-- [auth] Events Requiring Client Action --> | ||||
| <section anchor="Events_Requiring_Server_Action" numbered="true" toc="default"> | ||||
| <name>Events Requiring Server Action</name> | ||||
| <t> | ||||
| The following events require server action to recover. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Client Crash and Restart</name> | ||||
| <t> | ||||
| As described in <xref target="OP_EXCHANGE_ID" format="default"/>, | ||||
| a restarted client sends EXCHANGE_ID in such a way that it | ||||
| causes the server to delete any sessions it had. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Client Crash and Restart --> | ||||
| <section anchor="client_crash_no_restart" numbered="true" toc="default"> | ||||
| <name>Client Crash with No Restart</name> | ||||
| <t> | ||||
| If a client crashes and never comes back, it will never send | ||||
| EXCHANGE_ID with its old client owner. Thus, the server has session | ||||
| state that will never be used again. After an extended period of time, | ||||
| and if the server has resource constraints, it <bcp14>MAY</bcp14> destroy the old | ||||
| session as well as locking state. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Client Crash with No Restart --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Extended Network Partition</name> | ||||
| <t> | ||||
| To the server, the extended network partition may be no | ||||
| different from a | ||||
| client crash with no | ||||
| restart (see | ||||
| <xref target="client_crash_no_restart" format="default"/>). | ||||
| Unless the server can discern that there is | ||||
| a network partition, it is free to treat the | ||||
| situation as if the client has crashed permanently. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Extended Network Partition" --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Backchannel Connection Loss</name> | ||||
| <t> | ||||
| If there were callback requests outstanding at the time | ||||
| of a connection loss, then the server | ||||
| <bcp14>MUST</bcp14> retry the requests, as described in | ||||
| <xref target="Retry_and_Replay" format="default"/>. Note that it | ||||
| is not necessary to retry requests over a connection | ||||
| with the same source network address or the same destination | ||||
| network address as the lost connection. As long as | ||||
| the session ID, slot ID, and sequence ID in the retry | ||||
| match that of the original request, the callback target will | ||||
| recognize the request as a retry even if it did see the request | ||||
| prior to disconnect. | ||||
| </t> | ||||
| <t> | ||||
| If the connection lost is the last one associated with the backchannel, | ||||
| then the server <bcp14>MUST</bcp14> indicate that in the sr_status_flags field of | ||||
| every SEQUENCE reply until the backchannel is re-established. | ||||
| There are two situations, each of which uses different | ||||
| status flags: no connectivity for the session's backchannel | ||||
| and no connectivity for any session backchannel of the client. | ||||
| See <xref target="OP_SEQUENCE" format="default"/> for a description of | ||||
| the appropriate flags in sr_status_flags. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Backchannel Connection Loss --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>GSS Context Loss</name> | ||||
| <t> | ||||
| The server <bcp14>SHOULD</bcp14> monitor when the number of RPCSEC_GSS | ||||
| handles assigned to the backchannel reaches one, and when that | ||||
| one handle is near expiry (i.e., between | ||||
| one and two periods of lease time), and | ||||
| indicate so in the sr_status_flags field of all SEQUENCE replies. | ||||
| The server <bcp14>MUST</bcp14> indicate when all of the | ||||
| backchannel's assigned RPCSEC_GSS handles | ||||
| have expired via the sr_status_flags field of all SEQUENCE replies. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] GSS Context Loss --> | ||||
| </section> | ||||
| <!-- [auth] Events Requiring Server Action --> | ||||
| </section> | ||||
| <!-- [auth] Session Mechanics - Recovery --> | ||||
| <section anchor="pnfs_and_sessions" numbered="true" toc="default"> | ||||
| <name>Parallel NFS and Sessions</name> | ||||
| <t> | ||||
| A client and server can potentially be a non-pNFS implementation, | ||||
| a metadata server implementation, a data server implementation, or two or | ||||
| three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, | ||||
| EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags | ||||
| (not mutually exclusive) are passed in the EXCHANGE_ID arguments | ||||
| and results to allow the client to indicate how it wants to use sessions created | ||||
| under the client ID, and to allow the server to indicate how it | ||||
| will allow the sessions to be used. | ||||
| See <xref target="pnfs_session_stuff" format="default"/> for pNFS sessions considerations. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] Parallel NFS and Sessions --> | ||||
| </section> | ||||
| <!-- [auth] Session --> | ||||
| </section> | ||||
| <!-- [auth] Core Infrastructure --> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Protocol Constants and Data Types</name> | ||||
| <t> | ||||
| The syntax and semantics to describe the data types of the NFSv4.1 | ||||
| protocol are defined in the XDR (<xref target="RFC4506" format="default">RFC 4506</xref>) and RPC | ||||
| (<xref target="RFC5531" format="default">RFC 5531</xref>) documents. The next sections | ||||
| build upon the XDR data types to define constants, types, and structures | ||||
| specific to this protocol. The full list of XDR data types is in <xref target="RFC5662" format="default"/>. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Basic Constants</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const NFS4_FHSIZE = 128; | ||||
| const NFS4_VERIFIER_SIZE = 8; | ||||
| const NFS4_OPAQUE_LIMIT = 1024; | ||||
| const NFS4_SESSIONID_SIZE = 16; | ||||
| const NFS4_INT64_MAX = 0x7fffffffffffffff; | ||||
| const NFS4_UINT64_MAX = 0xffffffffffffffff; | ||||
| const NFS4_INT32_MAX = 0x7fffffff; | ||||
| const NFS4_UINT32_MAX = 0xffffffff; | ||||
| const NFS4_MAXFILELEN = 0xffffffffffffffff; | ||||
| const NFS4_MAXFILEOFF = 0xfffffffffffffffe; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| Except where noted, all these constants are defined in bytes. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| NFS4_FHSIZE is the maximum size of a filehandle. | ||||
| </li> | ||||
| <li> | ||||
| NFS4_VERIFIER_SIZE is the fixed size of a verifier. | ||||
| </li> | ||||
| <li> | ||||
| NFS4_OPAQUE_LIMIT is the maximum size of certain | ||||
| opaque information. | ||||
| </li> | ||||
| <li> | ||||
| NFS4_SESSIONID_SIZE is the fixed size of a session identifier. | ||||
| </li> | ||||
| <li> | ||||
| NFS4_INT64_MAX is the maximum value of a signed 64-bit integer. | ||||
| </li> | ||||
| <li> | ||||
| NFS4_UINT64_MAX is the maximum value of an unsigned 64-bit integer. | ||||
| </li> | ||||
| <li> | ||||
| NFS4_INT32_MAX is the maximum value of a signed 32-bit integer. | ||||
| </li> | ||||
| <li> | ||||
| NFS4_UINT32_MAX is the maximum value of an unsigned 32-bit integer. | ||||
| </li> | ||||
| <li> | ||||
| NFS4_MAXFILELEN is the maximum length of a regular file. | ||||
| </li> | ||||
| <li> | ||||
| NFS4_MAXFILEOFF is the maximum offset into a regular file. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Basic Data Types</name> | ||||
| <t> | ||||
| These are the base NFSv4.1 data types. | ||||
| </t> | ||||
| <table anchor="basic_data_types" align="center"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Data Type</th> | ||||
| <th align="left">Definition</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">int32_t</td> | ||||
| <td align="left">typedef int int32_t;</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">uint32_t</td> | ||||
| <td align="left">typedef unsigned int uint32_t;</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">int64_t</td> | ||||
| <td align="left">typedef hyper int64_t;</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">typedef unsigned hyper uint64_t;</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">attrlist4</td> | ||||
| <td align="left"><t>typedef opaque attrlist4<>;</t> | ||||
| <t>Used for file/directory attributes.</t></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">bitmap4</td> | ||||
| <td align="left"><t>typedef uint32_t bitmap4<>;</t> | ||||
| <t>Used in attribute array encoding.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">changeid4</td> | ||||
| <td align="left"><t>typedef uint64_t changeid4;</t> | ||||
| <t>Used in the definition of change_info4.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">clientid4</td> | ||||
| <td align="left"><t>typedef uint64_t clientid4;</t> | ||||
| <t>Shorthand reference to client identification.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">count4</td> | ||||
| <td align="left"><t>typedef uint32_t count4;</t> | ||||
| <t>Various count parameters (READ, WRITE, | ||||
| COMMIT).</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">length4</td> | ||||
| <td align="left"><t>typedef uint64_t length4;</t> | ||||
| <t>The length of a byte-range within a file.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">mode4</td> | ||||
| <td align="left"><t>typedef uint32_t mode4;</t> | ||||
| <t>Mode attribute data type.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">nfs_cookie4</td> | ||||
| <td align="left"><t>typedef uint64_t nfs_cookie4;</t> | ||||
| <t>Opaque cookie value for READDIR.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">nfs_fh4</td> | ||||
| <td align="left"><t>typedef opaque nfs_fh4<NFS4_FHSIZE>;</t> | ||||
| <t>Filehandle definition.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">nfs_ftype4</td> | ||||
| <td align="left"><t>enum nfs_ftype4;</t> | ||||
| <t>Various defined file types.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">nfsstat4</td> | ||||
| <td align="left"><t>enum nfsstat4;</t> | ||||
| <t>Return value for operations.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">offset4</td> | ||||
| <td align="left"><t>typedef uint64_t offset4;</t> | ||||
| <t>Various offset designations (READ, WRITE, LOCK, COMMIT).</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">qop4</td> | ||||
| <td align="left"><t>typedef uint32_t qop4;</t> | ||||
| <t>Quality of protection designation in SECINFO.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">sec_oid4</td> | ||||
| <td align="left"><t>typedef opaque sec_oid4<>;</t> | ||||
| <t>Security Object Identifier. The sec_oid4 data type is not | ||||
| really opaque. Instead, it contains an ASN.1 OBJECT IDENTIFIER | ||||
| as used by GSS-API in the mech_type argument to | ||||
| GSS_Init_sec_context. See <xref target="RFC2743" | ||||
| format="default"/> for details.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">sequenceid4</td> | ||||
| <td align="left"><t>typedef uint32_t sequenceid4;</t> | ||||
| <t>Sequence number used for various session operations | ||||
| (EXCHANGE_ID, CREATE_SESSION, SEQUENCE, CB_SEQUENCE).</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">seqid4</td> | ||||
| <td align="left"><t>typedef uint32_t seqid4;</t> | ||||
| <t>Sequence identifier used for locking.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">sessionid4</td> | ||||
| <td align="left"><t>typedef opaque sessionid4[NFS4_SESSIONID_SIZE];</t> | ||||
| <t>Session identifier.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">slotid4</td> | ||||
| <td align="left"><t>typedef uint32_t slotid4;</t> | ||||
| <t>Sequencing artifact for various session operations | ||||
| (SEQUENCE, CB_SEQUENCE).</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">utf8string</td> | ||||
| <td align="left"><t>typedef opaque utf8string<>;</t> | ||||
| <t>UTF-8 encoding for strings.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">utf8str_cis</td> | ||||
| <td align="left"><t>typedef utf8string utf8str_cis;</t> | ||||
| <t>Case-insensitive UTF-8 string.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">utf8str_cs</td> | ||||
| <td align="left"><t>typedef utf8string utf8str_cs;</t> | ||||
| <t>Case-sensitive UTF-8 string.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">utf8str_mixed</td> | ||||
| <td align="left"><t>typedef utf8string utf8str_mixed;</t> | ||||
| <t>UTF-8 strings with a case-sensitive prefix and a | ||||
| case-insensitive suffix.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">component4</td> | ||||
| <td align="left"><t>typedef utf8str_cs component4;</t> | ||||
| <t>Represents pathname components.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">linktext4</td> | ||||
| <td align="left"><t>typedef utf8str_cs linktext4;</t> | ||||
| <t>Symbolic link contents ("symbolic link" is defined in an | ||||
| <xref target="symlink" format="default">Open Group</xref> | ||||
| standard).</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">pathname4</td> | ||||
| <td align="left"><t>typedef component4 pathname4<>;</t> | ||||
| <t>Represents pathname for fs_locations.</t> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">verifier4</td> | ||||
| <td align="left"><t>typedef opaque verifier4[NFS4_VERIFIER_SIZE];</t> | ||||
| <t>Verifier used for various operations (COMMIT, CREATE, | ||||
| EXCHANGE_ID, OPEN, READDIR, WRITE) NFS4_VERIFIER_SIZE is defined | ||||
| as 8.</t> | ||||
| </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t>End of Base Data Types</t> | ||||
| </section> | ||||
| <!-- [auth] start here for the structured data types --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Structured Data Types</name> | ||||
| <section toc="exclude" anchor="nfstime4" numbered="true"> | ||||
| <name>nfstime4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct nfstime4 { | ||||
| int64_t seconds; | ||||
| uint32_t nseconds; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The nfstime4 data type gives the number of seconds and | ||||
| nanoseconds since midnight or zero hour January 1, 1970 | ||||
| Coordinated Universal Time (UTC). Values greater than zero | ||||
| for the seconds field denote dates after the zero hour January 1, | ||||
| 1970. Values less than zero for the seconds field denote | ||||
| dates before the zero hour January 1, 1970. In both cases, the | ||||
| nseconds field is to be added to the seconds field for the | ||||
| final time representation. For example, if the time to be | ||||
| represented is one-half second before zero hour January 1, 1970, | ||||
| the seconds field would have a value of negative one (-1) and | ||||
| the nseconds field would have a value of one-half second | ||||
| (500000000). Values greater than 999,999,999 for nseconds are | ||||
| invalid. | ||||
| </t> | ||||
| <t> | ||||
| This data type is used to pass time and date information. A | ||||
| server converts to and from its local representation of time | ||||
| when processing time values, preserving as much accuracy as | ||||
| possible. If the precision of timestamps stored for a | ||||
| file system object is less than defined, loss of precision can | ||||
| occur. An adjunct time maintenance protocol is <bcp14>RECOMMENDED</bcp14> to | ||||
| reduce client and server time skew. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="time_how4" numbered="true"> | ||||
| <name>time_how4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| enum time_how4 { | ||||
| SET_TO_SERVER_TIME4 = 0, | ||||
| SET_TO_CLIENT_TIME4 = 1 | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="settime4" numbered="true"> | ||||
| <name>settime4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| union settime4 switch (time_how4 set_it) { | ||||
| case SET_TO_CLIENT_TIME4: | ||||
| nfstime4 time; | ||||
| default: | ||||
| void; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The time_how4 and settime4 data types are used | ||||
| for setting timestamps in file object attributes. If set_it is SET_TO_SERVER_TIME4, then the server | ||||
| uses its local representation of time for the time value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="specdata4" numbered="true"> | ||||
| <name>specdata4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct specdata4 { | ||||
| uint32_t specdata1; /* major device number */ | ||||
| uint32_t specdata2; /* minor device number */ | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| This data type represents the device numbers for the device file | ||||
| types NF4CHR and NF4BLK. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="fsid4" numbered="true"> | ||||
| <name>fsid4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct fsid4 { | ||||
| uint64_t major; | ||||
| uint64_t minor; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="chg_policy4" numbered="true"> | ||||
| <name>change_policy4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct change_policy4 { | ||||
| uint64_t cp_major; | ||||
| uint64_t cp_minor; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The change_policy4 data type is used for the change_policy | ||||
| <bcp14>RECOMMENDED</bcp14> attribute. It provides change sequencing indication | ||||
| analogous to the change attribute. To enable the server to | ||||
| present a value valid across server re-initialization without | ||||
| requiring persistent storage, two 64-bit quantities are used, | ||||
| allowing one to be a server instance ID and the second to be | ||||
| incremented non-persistently, within a given server instance. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="fattr4" numbered="true"> | ||||
| <name>fattr4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct fattr4 { | ||||
| bitmap4 attrmask; | ||||
| attrlist4 attr_vals; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The fattr4 data type is used to represent file and directory attributes. | ||||
| </t> | ||||
| <t> | ||||
| The bitmap is a counted array of 32-bit integers used to contain bit | ||||
| values. The position of the integer in the array that contains bit n | ||||
| can be computed from the expression (n / 32), and its bit within that | ||||
| integer is (n mod 32). | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| 0 1 | ||||
| +-----------+-----------+-----------+-- | ||||
| | count | 31 .. 0 | 63 .. 32 | | ||||
| +-----------+-----------+-----------+-- | ||||
| ]]></artwork> | ||||
| </section> | ||||
| <section toc="exclude" anchor="change_info4" numbered="true"> | ||||
| <name>change_info4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct change_info4 { | ||||
| bool atomic; | ||||
| changeid4 before; | ||||
| changeid4 after; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| This data type is used with the CREATE, LINK, OPEN, REMOVE, and RENAME | ||||
| operations to let the client know the value of the change attribute | ||||
| for the directory in which the target file system object resides. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="netaddr4" numbered="true"> | ||||
| <name>netaddr4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct netaddr4 { | ||||
| /* see struct rpcb in RFC 1833 */ | ||||
| string na_r_netid<>; /* network id */ | ||||
| string na_r_addr<>; /* universal address */ | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The netaddr4 data type is used to identify network transport endpoints. | ||||
| The na_r_netid and na_r_addr fields respectively contain a netid | ||||
| and uaddr. The netid and uaddr concepts are defined in | ||||
| <xref target="RFC5665" format="default"/>. The netid and uaddr formats for | ||||
| TCP over IPv4 and TCP over IPv6 are defined in <xref target="RFC5665" format="default"/>, | ||||
| specifically Tables 2 and 3 and in | ||||
| Sections <xref target="RFC5665" section="5.2.3.3" sectionFormat="bare"/> and <xref target="RFC5665" section="5.2.3.4" sectionFormat="bare"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="state_owner4" numbered="true"> | ||||
| <name>state_owner4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct state_owner4 { | ||||
| clientid4 clientid; | ||||
| opaque owner<NFS4_OPAQUE_LIMIT>; | ||||
| }; | ||||
| typedef state_owner4 open_owner4; | ||||
| typedef state_owner4 lock_owner4; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The state_owner4 data type is the base type for the | ||||
| open_owner4 (<xref target="open_owner4" format="default"/>) and | ||||
| lock_owner4 (<xref target="lock_owner4" format="default"/>). | ||||
| </t> | ||||
| <section toc="exclude" anchor="open_owner4" numbered="true"> | ||||
| <name>open_owner4</name> | ||||
| <t> | ||||
| This data type is used to identify the owner of OPEN state. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="lock_owner4" numbered="true"> | ||||
| <name>lock_owner4</name> | ||||
| <t> | ||||
| This structure is used to identify the owner of byte-range | ||||
| locking state. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section toc="exclude" anchor="open_to_lock_owner4" numbered="true"> | ||||
| <name>open_to_lock_owner4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct open_to_lock_owner4 { | ||||
| seqid4 open_seqid; | ||||
| stateid4 open_stateid; | ||||
| seqid4 lock_seqid; | ||||
| lock_owner4 lock_owner; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| This data type is used for the first LOCK operation done for | ||||
| an open_owner4. It provides both the open_stateid and | ||||
| lock_owner, such that the transition is made from a valid | ||||
| open_stateid sequence to that of the new lock_stateid | ||||
| sequence. Using this mechanism avoids the confirmation of the | ||||
| lock_owner/lock_seqid pair since it is tied to established | ||||
| state in the form of the open_stateid/open_seqid. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="stateid4" numbered="true"> | ||||
| <name>stateid4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct stateid4 { | ||||
| uint32_t seqid; | ||||
| opaque other[12]; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| This data type is used for the various state sharing | ||||
| mechanisms between the client and server. The client | ||||
| never modifies a value of data type stateid. | ||||
| The starting value of the | ||||
| "seqid" field is undefined. The server is required to | ||||
| increment the "seqid" field by one at each transition | ||||
| of the stateid. This is important since the client will | ||||
| inspect the seqid in OPEN stateids to determine the order of | ||||
| OPEN processing done by the server. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="layouttype4" numbered="true"> | ||||
| <name>layouttype4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| enum layouttype4 { | ||||
| LAYOUT4_NFSV4_1_FILES = 0x1, | ||||
| LAYOUT4_OSD2_OBJECTS = 0x2, | ||||
| LAYOUT4_BLOCK_VOLUME = 0x3 | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| This data type indicates what type of layout is being used. | ||||
| The file server advertises the | ||||
| layout types it supports through the fs_layout_type file | ||||
| system attribute (<xref target="attrdef_fs_layout_type" format="default"/>). | ||||
| A client asks for layouts of a particular type in LAYOUTGET, | ||||
| and processes those layouts in its layout-type-specific logic. | ||||
| </t> | ||||
| <t> | ||||
| The layouttype4 data type is 32 bits in length. The range | ||||
| represented by the layout type is split into three parts. Type | ||||
| 0x0 is reserved. Types | ||||
| within the range 0x00000001-0x7FFFFFFF are globally unique and | ||||
| are assigned according to the description in <xref target="pnfsiana" format="default"/>; they are maintained by IANA. Types | ||||
| within the range 0x80000000-0xFFFFFFFF are site specific and | ||||
| for private use only. | ||||
| </t> | ||||
| <t> | ||||
| The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 | ||||
| file layout type, as defined in <xref target="file_layout_type" format="default"/>, is to be used. The LAYOUT4_OSD2_OBJECTS | ||||
| enumeration specifies that the object layout, as defined in | ||||
| <xref target="RFC5664" format="default"/>, is to be used. Similarly, | ||||
| the LAYOUT4_BLOCK_VOLUME enumeration specifies that the block/volume | ||||
| layout, as defined in <xref target="RFC5663" format="default"/>, is to be | ||||
| used. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="deviceid4" numbered="true"> | ||||
| <name>deviceid4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const NFS4_DEVICEID4_SIZE = 16; | ||||
| typedef opaque deviceid4[NFS4_DEVICEID4_SIZE]; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| Layout information includes device IDs that | ||||
| specify a storage device through a compact handle. | ||||
| Addressing and type information is obtained | ||||
| with the GETDEVICEINFO operation. Device IDs | ||||
| are not guaranteed to be valid across metadata | ||||
| server restarts. A device ID is unique per client | ||||
| ID and layout type. See <xref target="device_ids" format="default"/> for more details. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="device_addr4" numbered="true"> | ||||
| <name>device_addr4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct device_addr4 { | ||||
| layouttype4 da_layout_type; | ||||
| opaque da_addr_body<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The device address is used to set up a communication channel | ||||
| with the storage device. Different layout types will require | ||||
| different data types to define how they communicate | ||||
| with storage devices. The opaque da_addr_body field is | ||||
| interpreted based on the specified da_layout_type field. | ||||
| </t> | ||||
| <t> | ||||
| This document defines the device address for the NFSv4.1 file | ||||
| layout (see <xref target="file_data_types" format="default"/>), which | ||||
| identifies a storage device by network IP address and port | ||||
| number. This is sufficient for the clients to communicate | ||||
| with the NFSv4.1 storage devices, and may be sufficient for | ||||
| other layout types as well. Device types for object-based storage | ||||
| devices and block storage devices (e.g., Small Computer System | ||||
| Interface (SCSI) volume labels) | ||||
| are defined by their respective layout specifications. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="layout_content4" numbered="true"> | ||||
| <name>layout_content4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct layout_content4 { | ||||
| layouttype4 loc_type; | ||||
| opaque loc_body<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The loc_body field is interpreted based on the layout type (loc_type). | ||||
| This document defines the loc_body for the NFSv4.1 | ||||
| file layout type; see <xref target="file_data_types" format="default"/> for its definition. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="layout4" numbered="true"> | ||||
| <name>layout4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct layout4 { | ||||
| offset4 lo_offset; | ||||
| length4 lo_length; | ||||
| layoutiomode4 lo_iomode; | ||||
| layout_content4 lo_content; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The layout4 data type defines a layout for a file. The layout | ||||
| type specific data is opaque within lo_content. | ||||
| Since layouts are sub-dividable, the offset | ||||
| and length together with the file's filehandle, the client ID, | ||||
| iomode, and layout type identify the layout. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="layoutupdate4" numbered="true"> | ||||
| <name>layoutupdate4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct layoutupdate4 { | ||||
| layouttype4 lou_type; | ||||
| opaque lou_body<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The layoutupdate4 data type is used by the client to return | ||||
| updated layout information to the metadata server via the | ||||
| LAYOUTCOMMIT (<xref target="OP_LAYOUTCOMMIT" format="default"/>) operation. | ||||
| This data type provides a channel to pass | ||||
| layout type specific information (in field lou_body) | ||||
| back to the metadata server. | ||||
| For example, for the block/volume layout type, this could include the | ||||
| list of reserved blocks that were written. The contents of | ||||
| the opaque lou_body argument are determined by the layout type. | ||||
| The NFSv4.1 file-based layout | ||||
| does not use this data type; if lou_type is LAYOUT4_NFSV4_1_FILES, | ||||
| the lou_body field <bcp14>MUST</bcp14> | ||||
| have a zero length. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="layouthint4" numbered="true"> | ||||
| <name>layouthint4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct layouthint4 { | ||||
| layouttype4 loh_type; | ||||
| opaque loh_body<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The layouthint4 data type is used by the client to pass in a | ||||
| hint about the type of layout it would like created for a particular | ||||
| file. It is the data type specified by the layout_hint | ||||
| attribute described in <xref target="attrdef_layout_hint" format="default"/>. | ||||
| The metadata server may ignore the hint | ||||
| or may selectively ignore fields within the hint. This hint should | ||||
| be provided at create time as part of the initial attributes within | ||||
| OPEN. The loh_body field is specific to the type of layout (loh_type). | ||||
| The NFSv4.1 file-based layout uses the nfsv4_1_file_layouthint4 | ||||
| data type as defined in <xref target="file_data_types" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="layoutiomode4" numbered="true"> | ||||
| <name>layoutiomode4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| enum layoutiomode4 { | ||||
| LAYOUTIOMODE4_READ = 1, | ||||
| LAYOUTIOMODE4_RW = 2, | ||||
| LAYOUTIOMODE4_ANY = 3 | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The iomode specifies whether the client intends to just read or both | ||||
| read and write the data represented by the | ||||
| layout. While the LAYOUTIOMODE4_ANY iomode <bcp14>MUST NOT</bcp14> be used in | ||||
| the arguments to the LAYOUTGET operation, it <bcp14>MAY</bcp14> | ||||
| be used in the arguments to the LAYOUTRETURN and CB_LAYOUTRECALL | ||||
| operations. The LAYOUTIOMODE4_ANY iomode | ||||
| specifies that layouts pertaining to both LAYOUTIOMODE4_READ | ||||
| and LAYOUTIOMODE4_RW iomodes are being returned or recalled, | ||||
| respectively. The metadata server's use of the iomode may | ||||
| depend on the layout type being used. The storage devices <bcp14>MAY</bcp14> | ||||
| validate I/O accesses against the iomode and reject invalid accesses. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="nfs_impl_id4" numbered="true"> | ||||
| <name>nfs_impl_id4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct nfs_impl_id4 { | ||||
| utf8str_cis nii_domain; | ||||
| utf8str_cs nii_name; | ||||
| nfstime4 nii_date; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| This data type is used to identify client and server | ||||
| implementation details. The nii_domain field is the DNS domain | ||||
| name with which the implementor is associated. The nii_name | ||||
| field is the product name of the implementation and is | ||||
| completely free form. It is <bcp14>RECOMMENDED</bcp14> that the nii_name be | ||||
| used to distinguish machine architecture, machine platforms, | ||||
| revisions, versions, and patch levels. The nii_date field is | ||||
| the timestamp of when the software instance was published or | ||||
| built. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="threshold_item4" numbered="true"> | ||||
| <name>threshold_item4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct threshold_item4 { | ||||
| layouttype4 thi_layout_type; | ||||
| bitmap4 thi_hintset; | ||||
| opaque thi_hintlist<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| This data type contains a list of hints specific to | ||||
| a layout type for helping the client determine when | ||||
| it should send I/O directly through the metadata | ||||
| server versus the storage devices. The data type | ||||
| consists of the layout type (thi_layout_type), | ||||
| a bitmap (thi_hintset) describing the set of | ||||
| hints supported by the server (they may differ | ||||
| based on the layout type), and a list of hints | ||||
| (thi_hintlist) whose content is determined by | ||||
| the hintset bitmap. See the mdsthreshold attribute | ||||
| for more details. | ||||
| </t> | ||||
| <t> | ||||
| The thi_hintset field is a bitmap of the following values: | ||||
| </t> | ||||
| <table align="center" anchor="table2"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">name</th> | ||||
| <th align="left">#</th> | ||||
| <th align="left">Data Type</th> | ||||
| <th align="left">Description</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">threshold4_read_size</td> | ||||
| <td align="left">0</td> | ||||
| <td align="left">length4</td> | ||||
| <td align="left"> | ||||
| If a file's length is less than the value of threshold4_read_size, | ||||
| then it is <bcp14>RECOMMENDED</bcp14> that the client read from the file via the MDS and not | ||||
| a storage device. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">threshold4_write_size</td> | ||||
| <td align="left">1</td> | ||||
| <td align="left">length4</td> | ||||
| <td align="left"> | ||||
| If a file's length is less than the value of threshold4_write_size, | ||||
| then it is <bcp14>RECOMMENDED</bcp14> that the client write to the file via the MDS and not | ||||
| a storage device. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">threshold4_read_iosize</td> | ||||
| <td align="left">2</td> | ||||
| <td align="left">length4</td> | ||||
| <td align="left"> | ||||
| For read I/O sizes below this threshold, it is <bcp14>RECOMMENDED</bcp14> to | ||||
| read data through the MDS. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">threshold4_write_iosize</td> | ||||
| <td align="left">3</td> | ||||
| <td align="left">length4</td> | ||||
| <td align="left"> | ||||
| For write I/O sizes below this threshold, it is <bcp14>RECOMMENDED</bcp14> to | ||||
| write data through the MDS. | ||||
| </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| <section toc="exclude" anchor="mdsthreshold4" numbered="true"> | ||||
| <name>mdsthreshold4</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct mdsthreshold4 { | ||||
| threshold_item4 mth_hints<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| This data type holds an array of elements of data type | ||||
| threshold_item4, | ||||
| each of which is valid for a particular layout type. An array | ||||
| is necessary because a server can support multiple layout types | ||||
| for a single file. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] End of Data Types --> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="Filehandles" numbered="true" toc="default"> | ||||
| <name>Filehandles</name> | ||||
| <t> | ||||
| The filehandle in the NFS protocol is a per-server unique identifier | ||||
| for a file system object. The contents of the filehandle are opaque | ||||
| to the client. Therefore, the server is responsible for translating | ||||
| the filehandle to an internal representation of the file system | ||||
| object. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Obtaining the First Filehandle</name> | ||||
| <t> | ||||
| The operations of the NFS protocol are defined in terms of one | ||||
| or more filehandles. Therefore, the client needs a filehandle | ||||
| to initiate communication with the server. With the NFSv3 | ||||
| protocol (<xref target="RFC1813" format="default">RFC 1813</xref>), there | ||||
| exists an ancillary protocol to obtain this first filehandle. | ||||
| The MOUNT protocol, RPC program number 100005, provides the | ||||
| mechanism of translating a string-based file system pathname to | ||||
| a filehandle, which can then be used by the NFS protocols. | ||||
| </t> | ||||
| <t> | ||||
| The MOUNT protocol has deficiencies in the area of security and | ||||
| use via firewalls. This is one reason that the use of the | ||||
| public filehandle was introduced in <xref target="RFC2054" format="default">RFC 2054</xref> and <xref target="RFC2055" format="default">RFC 2055</xref>. With the use of the public | ||||
| filehandle in combination with the LOOKUP operation in the NFSv3 | ||||
| protocol, it has been demonstrated that the | ||||
| MOUNT protocol is unnecessary for viable interaction between NFS | ||||
| client and server. | ||||
| </t> | ||||
| <t> | ||||
| Therefore, the NFSv4.1 protocol will not use an ancillary | ||||
| protocol for translation from string-based pathnames to a filehandle. | ||||
| Two special filehandles will be used as starting points for the NFS | ||||
| client. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Root Filehandle</name> | ||||
| <t> | ||||
| The first of the special filehandles is the ROOT filehandle. The ROOT | ||||
| filehandle is the "conceptual" root of the file system namespace at | ||||
| the NFS server. The client uses or starts with the ROOT filehandle | ||||
| by employing the PUTROOTFH operation. The PUTROOTFH operation | ||||
| instructs the server to set the "current" filehandle to the ROOT of | ||||
| the server's file tree. Once this PUTROOTFH operation is used, the | ||||
| client can then traverse the entirety of the server's file tree with | ||||
| the LOOKUP operation. A complete discussion of the server namespace | ||||
| is in <xref target="single_server_namespace" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Public Filehandle</name> | ||||
| <t> | ||||
| The second special filehandle is the PUBLIC filehandle. Unlike the | ||||
| ROOT filehandle, the PUBLIC filehandle may be bound or represent an | ||||
| arbitrary file system object at the server. The server is responsible | ||||
| for this binding. It may be that the PUBLIC filehandle and the ROOT | ||||
| filehandle refer to the same file system object. However, it is up to | ||||
| the administrative software at the server and the policies of the | ||||
| server administrator to define the binding of the PUBLIC filehandle | ||||
| and server file system object. The client may not make any | ||||
| assumptions about this binding. The client uses the PUBLIC filehandle | ||||
| via the PUTPUBFH operation. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Filehandle Types</name> | ||||
| <t> | ||||
| In the NFSv3 protocol, there was one type of filehandle | ||||
| with a single set of semantics. This type of filehandle is termed | ||||
| "persistent" in NFSv4.1. The semantics of a persistent | ||||
| filehandle remain the same as before. A new type of filehandle | ||||
| introduced in NFSv4.1 is the "volatile" filehandle, which | ||||
| attempts to accommodate certain server environments. | ||||
| </t> | ||||
| <t> | ||||
| The volatile filehandle type was introduced to address server | ||||
| functionality or implementation issues that make correct | ||||
| implementation of a persistent filehandle infeasible. Some server | ||||
| environments do not provide a file-system-level invariant that can be | ||||
| used to construct a persistent filehandle. The underlying server | ||||
| file system may not provide the invariant or the server's file system | ||||
| programming interfaces may not provide access to the needed invariant. | ||||
| Volatile filehandles may ease the implementation of server | ||||
| functionality such as hierarchical storage management or file system | ||||
| reorganization or migration. However, the volatile filehandle | ||||
| increases the implementation burden for the client. | ||||
| </t> | ||||
| <t> | ||||
| Since the client will need to handle persistent and volatile | ||||
| filehandles differently, a file attribute is defined that may be used | ||||
| by the client to determine the filehandle types being returned by the | ||||
| server. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>General Properties of a Filehandle</name> | ||||
| <t> | ||||
| The filehandle contains all the information the | ||||
| server needs to distinguish an individual file. | ||||
| To the client, the filehandle is opaque. The | ||||
| client stores filehandles for use in a later | ||||
| request and can compare two filehandles from the | ||||
| same server for equality by doing a byte-by-byte | ||||
| comparison. However, the client <bcp14>MUST NOT</bcp14> otherwise | ||||
| interpret the contents of filehandles. If two | ||||
| filehandles from the same server are equal, they | ||||
| <bcp14>MUST</bcp14> refer to the same file. Servers <bcp14>SHOULD</bcp14> try | ||||
| to maintain a one-to-one correspondence between | ||||
| filehandles and files, but this is not required. | ||||
| Clients <bcp14>MUST</bcp14> use filehandle comparisons only to | ||||
| improve performance, not for correct behavior. | ||||
| All clients need to be prepared for situations | ||||
| in which it cannot be determined whether two | ||||
| filehandles denote the same object and in such | ||||
| cases, avoid making invalid assumptions that might | ||||
| cause incorrect behavior. Further discussion | ||||
| of filehandle and attribute comparison in the | ||||
| context of data caching is presented in <xref target="data_caching_and_file_identity" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| As an example, in the case that two different pathnames when | ||||
| traversed at the server terminate at the same file system object, the | ||||
| server <bcp14>SHOULD</bcp14> return the same filehandle for each path. This can | ||||
| occur if a hard link (see <xref target="hardlink" format="default"/>) is used | ||||
| to create two file names that refer to the same underlying | ||||
| file object and associated data. For example, if paths /a/b/c | ||||
| and /a/d/c refer to the same file, the server <bcp14>SHOULD</bcp14> return | ||||
| the same filehandle for both pathnames' traversals. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Persistent Filehandle</name> | ||||
| <t> | ||||
| A persistent filehandle is defined as having a fixed value for the | ||||
| lifetime of the file system object to which it refers. Once the | ||||
| server creates the filehandle for a file system object, the server | ||||
| <bcp14>MUST</bcp14> accept the same filehandle for the object for the lifetime of the | ||||
| object. If the server restarts, the NFS server <bcp14>MUST</bcp14> honor | ||||
| the same filehandle value as it did in the server's previous | ||||
| instantiation. Similarly, if the file system is migrated, the new NFS | ||||
| server <bcp14>MUST</bcp14> honor the same filehandle as the old NFS server. | ||||
| </t> | ||||
| <t> | ||||
| The persistent filehandle will be become stale or invalid when the | ||||
| file system object is removed. When the server is presented with a | ||||
| persistent filehandle that refers to a deleted object, it <bcp14>MUST</bcp14> return | ||||
| an error of NFS4ERR_STALE. A filehandle may become stale when the | ||||
| file system containing the object is no longer available. The file | ||||
| system may become unavailable if it exists on removable media and the | ||||
| media is no longer available at the server or the file system in whole | ||||
| has been destroyed or the file system has simply been removed from the | ||||
| server's namespace (i.e., unmounted in a UNIX environment). | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Volatile Filehandle</name> | ||||
| <t> | ||||
| A volatile filehandle does not share the same longevity | ||||
| characteristics of a persistent filehandle. The server may | ||||
| determine that a volatile filehandle is no longer valid at many | ||||
| different points in time. If the server can definitively determine | ||||
| that a volatile filehandle refers to an object that has been removed, | ||||
| the server should return NFS4ERR_STALE to the client (as is the case | ||||
| for persistent filehandles). In all other cases where the server | ||||
| determines that a volatile filehandle can no longer be used, it should | ||||
| return an error of NFS4ERR_FHEXPIRED. | ||||
| </t> | ||||
| <t> | ||||
| The <bcp14>REQUIRED</bcp14> attribute "fh_expire_type" is used by the client to | ||||
| determine what type of filehandle the server is providing for a | ||||
| particular file system. This attribute is a bitmask with the | ||||
| following values: | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>FH4_PERSISTENT</dt> | ||||
| <dd> | ||||
| The value of FH4_PERSISTENT is used to indicate a persistent | ||||
| filehandle, which is valid until the object is removed from the | ||||
| file system. The server will not return NFS4ERR_FHEXPIRED for this | ||||
| filehandle. FH4_PERSISTENT is defined as a value in which none of the | ||||
| bits specified below are set. | ||||
| </dd> | ||||
| <dt>FH4_VOLATILE_ANY</dt> | ||||
| <dd> | ||||
| The filehandle may expire at any time, except as specifically | ||||
| excluded (i.e., FH4_NO_EXPIRE_WITH_OPEN). | ||||
| </dd> | ||||
| <dt>FH4_NOEXPIRE_WITH_OPEN</dt> | ||||
| <dd> | ||||
| May only be set when FH4_VOLATILE_ANY is set. If this bit is set, | ||||
| then the meaning of FH4_VOLATILE_ANY is qualified to exclude any | ||||
| expiration of the filehandle when it is open. | ||||
| </dd> | ||||
| <dt>FH4_VOL_MIGRATION</dt> | ||||
| <dd> | ||||
| The filehandle will expire as a result of a file system | ||||
| transition (migration or replication), in those cases in | ||||
| which the continuity of filehandle use is not specified by | ||||
| handle class information | ||||
| within the fs_locations_info attribute. When this bit is | ||||
| set, clients without access to fs_locations_info | ||||
| information should assume that filehandles will expire on file | ||||
| system transitions. | ||||
| </dd> | ||||
| <dt>FH4_VOL_RENAME</dt> | ||||
| <dd> | ||||
| The filehandle will expire during rename. This includes a rename by | ||||
| the requesting client or a rename by any other client. If FH4_VOL_ANY | ||||
| is set, FH4_VOL_RENAME is redundant. | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| Servers that provide volatile filehandles that can expire | ||||
| while open require special care as regards handling of RENAMEs | ||||
| and REMOVEs. This situation can arise if FH4_VOL_MIGRATION or | ||||
| FH4_VOL_RENAME is set, if FH4_VOLATILE_ANY is set and | ||||
| FH4_NOEXPIRE_WITH_OPEN is not set, or if a non-read-only file system | ||||
| has a transition target in a different handle | ||||
| class. In these cases, the server should deny a RENAME | ||||
| or REMOVE that would affect an OPEN file of any of the | ||||
| components leading to the OPEN file. In addition, the server | ||||
| should deny all RENAME or REMOVE requests during the grace period, | ||||
| in order to make sure that reclaims of files where filehandles | ||||
| may have expired do not do a reclaim for the wrong file. | ||||
| </t> | ||||
| <t> | ||||
| Volatile filehandles are especially suitable for implementation | ||||
| of the pseudo file systems used to bridge exports. See | ||||
| <xref target="pseudo_fs_volatility" format="default"/> for a discussion of this. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>One Method of Constructing a Volatile Filehandle</name> | ||||
| <t> | ||||
| A volatile filehandle, while opaque to the client, could contain: | ||||
| </t> | ||||
| <sourcecode type="pseudocode"><![CDATA[ | ||||
| [volatile bit = 1 | server boot time | slot | generation number] | ||||
| ]]></sourcecode> | ||||
| <ul> | ||||
| <li>slot is an index in the server volatile filehandle table</li> | ||||
| <li>generation number is the generation number for the table entry/slot</li> | ||||
| </ul> | ||||
| <t> | ||||
| When the client presents a volatile filehandle, the server makes the | ||||
| following checks, which assume that the check for the volatile bit has | ||||
| passed. If the server boot time is less than the current server boot | ||||
| time, return NFS4ERR_FHEXPIRED. If slot is out of range, return | ||||
| NFS4ERR_BADHANDLE. If the generation number does not match, return | ||||
| NFS4ERR_FHEXPIRED. | ||||
| </t> | ||||
| <t> | ||||
| When the server restarts, the table is gone (it is volatile). | ||||
| </t> | ||||
| <t> | ||||
| If the volatile bit is 0, then it is a persistent filehandle with a | ||||
| different structure following it. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Client Recovery from Filehandle Expiration</name> | ||||
| <t> | ||||
| If possible, the client <bcp14>SHOULD</bcp14> recover from the receipt of an | ||||
| NFS4ERR_FHEXPIRED error. The client must take on additional | ||||
| responsibility so that it may prepare itself to recover from the | ||||
| expiration of a volatile filehandle. If the server returns persistent | ||||
| filehandles, the client does not need these additional steps. | ||||
| </t> | ||||
| <t> | ||||
| For volatile filehandles, most commonly the client will need to store | ||||
| the component names leading up to and including the file system object | ||||
| in question. With these names, the client should be able to recover | ||||
| by finding a filehandle in the namespace that is still available or | ||||
| by starting at the root of the server's file system namespace. | ||||
| </t> | ||||
| <t> | ||||
| If the expired filehandle refers to an object that has been removed | ||||
| from the file system, obviously the client will not be able to recover | ||||
| from the expired filehandle. | ||||
| </t> | ||||
| <t> | ||||
| It is also possible that the expired filehandle refers to a file that | ||||
| has been renamed. If the file was renamed by another client, again it | ||||
| is possible that the original client will not be able to recover. | ||||
| However, in the case that the client itself is renaming the file and | ||||
| the file is open, it is possible that the client may be able to | ||||
| recover. The client can determine the new pathname based on the | ||||
| processing of the rename request. The client can then regenerate the | ||||
| new filehandle based on the new pathname. The client could also use | ||||
| the COMPOUND procedure to construct a series of operations | ||||
| like: | ||||
| </t> | ||||
| <sourcecode type="nfsv4compound"><![CDATA[ | ||||
| RENAME A B | ||||
| LOOKUP B | ||||
| GETFH | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| Note that the COMPOUND procedure does not provide atomicity. This | ||||
| example only reduces the overhead of recovering from an expired | ||||
| filehandle. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="file_attributes" numbered="true" toc="default"> | ||||
| <name>File Attributes</name> | ||||
| <t> | ||||
| To meet the requirements of extensibility and increased | ||||
| interoperability with non-UNIX platforms, attributes need to be handled | ||||
| in a flexible manner. The NFSv3 fattr3 structure contains a | ||||
| fixed list of attributes that not all clients and servers are able to | ||||
| support or care about. The fattr3 structure cannot be extended as | ||||
| new needs arise and it provides no way to indicate non-support. With | ||||
| the NFSv4.1 protocol, the client is able to query what attributes | ||||
| the server supports and construct requests with only those supported | ||||
| attributes (or a subset thereof). | ||||
| </t> | ||||
| <t> | ||||
| To this end, attributes are divided into three groups: <bcp14>REQUIRED</bcp14>, | ||||
| <bcp14>RECOMMENDED</bcp14>, and named. Both <bcp14>REQUIRED</bcp14> and <bcp14>RECOMMENDED</bcp14> attributes are | ||||
| supported in the NFSv4.1 protocol by a specific and well-defined | ||||
| encoding and are identified by number. They are requested by setting | ||||
| a bit in the bit vector sent in the GETATTR request; the server | ||||
| response includes a bit vector to list what attributes were returned | ||||
| in the response. New <bcp14>REQUIRED</bcp14> or <bcp14>RECOMMENDED</bcp14> attributes may be added | ||||
| to the NFSv4 protocol as part of a new minor version | ||||
| by publishing a | ||||
| Standards Track RFC that allocates a new attribute number value and | ||||
| defines the encoding for the attribute. See | ||||
| <xref target="minor_versioning" format="default"/> for further | ||||
| discussion. | ||||
| </t> | ||||
| <t> | ||||
| Named attributes are accessed by the new OPENATTR operation, which | ||||
| accesses a hidden directory of attributes associated with a file | ||||
| system object. OPENATTR takes a filehandle for the object and returns | ||||
| the filehandle for the attribute hierarchy. The filehandle for the | ||||
| named attributes is a directory object accessible by LOOKUP or READDIR | ||||
| and contains files whose names represent the named attributes and | ||||
| whose data bytes are the value of the attribute. For example: | ||||
| </t> | ||||
| <table align="center" anchor="table3"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left"/> | ||||
| <th align="left"/> | ||||
| <th align="left"/> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">LOOKUP</td> | ||||
| <td align="left">"foo"</td> | ||||
| <td align="left">; look up file</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">GETATTR</td> | ||||
| <td align="left">attrbits</td> | ||||
| <td align="left"/> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">OPENATTR</td> | ||||
| <td align="left"/> | ||||
| <td align="left">; access foo's named attributes</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">LOOKUP</td> | ||||
| <td align="left">"x11icon"</td> | ||||
| <td align="left">; look up specific attribute</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">READ</td> | ||||
| <td align="left">0,4096</td> | ||||
| <td align="left">; read stream of bytes</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> | ||||
| Named attributes are intended for data needed by applications rather | ||||
| than by an NFS client implementation. NFS implementors are strongly | ||||
| encouraged to define their new attributes as <bcp14>RECOMMENDED</bcp14> attributes by | ||||
| bringing them to the IETF Standards Track process. | ||||
| </t> | ||||
| <t> | ||||
| The set of attributes that are classified as <bcp14>REQUIRED</bcp14> is | ||||
| deliberately small since servers need to do whatever it takes to support | ||||
| them. A server should support as many of the <bcp14>RECOMMENDED</bcp14> attributes | ||||
| as possible but, by their definition, the server is not required to | ||||
| support all of them. Attributes are deemed <bcp14>REQUIRED</bcp14> if the data is | ||||
| both needed by a large number of clients and is not otherwise | ||||
| reasonably computable by the client when support is not provided on | ||||
| the server. | ||||
| </t> | ||||
| <t> | ||||
| Note that the hidden directory returned by OPENATTR is a convenience | ||||
| for protocol processing. The client should not make any assumptions | ||||
| about the server's implementation of named attributes and whether | ||||
| or not the underlying file system at the server has a named | ||||
| attribute directory. Therefore, operations such as SETATTR and | ||||
| GETATTR on the named attribute directory are undefined. | ||||
| </t> | ||||
| <section anchor="mandatory_attributes_intro" numbered="true" toc="default"> | ||||
| <name><bcp14>REQUIRED</bcp14> Attributes</name> | ||||
| <t> | ||||
| These <bcp14>MUST</bcp14> be supported by every NFSv4.1 client and server in | ||||
| order to ensure a minimum level of interoperability. The server <bcp14>MUST</bcp14> | ||||
| store and return these attributes, and the client <bcp14>MUST</bcp14> be able to | ||||
| function with an attribute set limited to these attributes. With just | ||||
| the <bcp14>REQUIRED</bcp14> attributes some client functionality may be impaired or | ||||
| limited in some ways. A client may ask for any of these attributes to | ||||
| be returned by setting a bit in the GETATTR request, and the server | ||||
| <bcp14>MUST</bcp14> return their value. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="recommended_attributes_intro" numbered="true" toc="default"> | ||||
| <name><bcp14>RECOMMENDED</bcp14> Attributes</name> | ||||
| <t> | ||||
| These attributes are understood well enough to warrant support in the | ||||
| NFSv4.1 protocol. However, they may not be supported on all | ||||
| clients and servers. A client may ask for any of these attributes to | ||||
| be returned by setting a bit in the GETATTR request but must handle | ||||
| the case where the server does not return them. A client <bcp14>MAY</bcp14> ask for | ||||
| the set of attributes the server supports and <bcp14>SHOULD NOT</bcp14> request | ||||
| attributes the server does not support. A server should be tolerant | ||||
| of requests for unsupported attributes and simply not return them | ||||
| rather than considering the request an error. It is expected that | ||||
| servers will support all attributes they comfortably can and only fail | ||||
| to support attributes that are difficult to support in their | ||||
| operating environments. A server should provide attributes whenever | ||||
| they don't have to "tell lies" to the client. For example, a file | ||||
| modification time should be either an accurate time or should not be | ||||
| supported by the server. At times this will be difficult for | ||||
| clients, but a client is better positioned to decide whether and how to | ||||
| fabricate or construct an attribute or whether to do without the | ||||
| attribute. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="named_attributes_intro" numbered="true" toc="default"> | ||||
| <name>Named Attributes</name> | ||||
| <t> | ||||
| These attributes are not supported by direct encoding in the NFSv4 | ||||
| protocol but are accessed by string names rather than | ||||
| numbers and correspond to an uninterpreted stream of bytes that are | ||||
| stored with the file system object. The namespace for these | ||||
| attributes may be accessed by using the OPENATTR operation. The | ||||
| OPENATTR operation returns a filehandle for a virtual "named attribute | ||||
| directory", and further perusal and modification of the namespace may | ||||
| be done using operations that work on more typical directories. In | ||||
| particular, READDIR may be used to get a list of such named attributes, | ||||
| and LOOKUP and OPEN may select a particular attribute. Creation of | ||||
| a new named attribute may be the result of an OPEN specifying file | ||||
| creation. | ||||
| </t> | ||||
| <t> | ||||
| Once an OPEN is done, named attributes may be examined and changed | ||||
| by normal READ and WRITE operations using the filehandles and stateids | ||||
| returned by OPEN. | ||||
| </t> | ||||
| <t> | ||||
| Named attributes and the named attribute directory may have | ||||
| their own (non-named) attributes. Each of these objects <bcp14>MUST</bcp14> have all | ||||
| of the <bcp14>REQUIRED</bcp14> attributes and may have additional <bcp14>RECOMMENDED</bcp14> | ||||
| attributes. However, the set of attributes for named attributes | ||||
| and the named attribute directory need not be, and | ||||
| typically will not be, as large as that for other objects in that | ||||
| file system. | ||||
| </t> | ||||
| <t> | ||||
| Named attributes and the named attribute directory might be the | ||||
| target of delegations (in the case of the named attribute directory, | ||||
| these will be directory delegations). However, since granting of | ||||
| delegations is at the server's discretion, a server | ||||
| need not support delegations on named attributes or the named | ||||
| attribute directory. | ||||
| </t> | ||||
| <t> | ||||
| It is <bcp14>RECOMMENDED</bcp14> that servers support arbitrary named attributes. A | ||||
| client should not depend on the ability to store any named attributes | ||||
| in the server's file system. If a server does support named | ||||
| attributes, a client that is also able to handle them should be able | ||||
| to copy a file's data and metadata with complete transparency from | ||||
| one location to another; this would imply that names allowed for | ||||
| regular directory entries are valid for named attribute names as well. | ||||
| </t> | ||||
| <t> | ||||
| In NFSv4.1, the structure of named attribute directories is | ||||
| restricted in a number of ways, in order to prevent the development | ||||
| of non-interoperable implementations in which some servers support | ||||
| a fully general hierarchical directory structure for named attributes | ||||
| while others support a limited but adequate structure for named attributes. | ||||
| In such an environment, clients or applications might come to | ||||
| depend on non-portable extensions. The restrictions are: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| CREATE is not allowed in a named attribute directory. Thus, such | ||||
| objects as symbolic links and special files are not allowed to | ||||
| be named attributes. Further, directories may not be created | ||||
| in a named attribute directory, so no hierarchical structure of | ||||
| named attributes for a single object is allowed. | ||||
| </li> | ||||
| <li> | ||||
| If OPENATTR is done on a named attribute directory or on | ||||
| a named attribute, the server <bcp14>MUST</bcp14> return NFS4ERR_WRONG_TYPE. | ||||
| </li> | ||||
| <li> | ||||
| Doing a RENAME of a named attribute to a different named | ||||
| attribute directory or to an ordinary (i.e., non-named-attribute) | ||||
| directory is not allowed. | ||||
| </li> | ||||
| <li> | ||||
| Creating hard links between named attribute directories or | ||||
| between named attribute directories and ordinary directories | ||||
| is not allowed. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Names of attributes will not be controlled by this document or other | ||||
| IETF Standards Track documents. See | ||||
| <xref target="namedattributesiana" format="default"/> | ||||
| for further discussion. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Classification of Attributes</name> | ||||
| <t> | ||||
| Each of the <bcp14>REQUIRED</bcp14> and <bcp14>RECOMMENDED</bcp14> attributes can be classified in | ||||
| one of three categories: per server (i.e., the value of the attribute will | ||||
| be the same for all file objects that share the same | ||||
| server owner; see <xref target="Server_Owners" format="default"/> for a definition of server | ||||
| owner), per file system (i.e., the value of the attribute will | ||||
| be the same for some or all file objects that share the | ||||
| same <xref target="attrdef_fsid" format="default">fsid attribute</xref> and | ||||
| server owner), or per file system | ||||
| object. Note that it is possible that some per file system attributes | ||||
| may vary within the file system, depending on the value of | ||||
| the <xref target="attrdef_homogeneous" format="default">"homogeneous"</xref> | ||||
| attribute. Note that the attributes time_access_set and | ||||
| time_modify_set are not listed in this section because they are | ||||
| write-only attributes corresponding to time_access and time_modify, | ||||
| and are used in a special instance of SETATTR. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| The per-server attribute is: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| lease_time | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The per-file system attributes are: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| supported_attrs, suppattr_exclcreat, fh_expire_type, link_support, | ||||
| symlink_support, unique_handles, aclsupport, | ||||
| cansettime, case_insensitive, case_preserving, | ||||
| chown_restricted, files_avail, files_free, | ||||
| files_total, fs_locations, homogeneous, maxfilesize, | ||||
| maxname, maxread, maxwrite, no_trunc, space_avail, | ||||
| space_free, space_total, time_delta, | ||||
| change_policy, fs_status, | ||||
| fs_layout_type, fs_locations_info, fs_charset_cap | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The per-file system object attributes are: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| type, change, size, named_attr, fsid, rdattr_error, | ||||
| filehandle, acl, archive, fileid, hidden, maxlink, | ||||
| mimetype, mode, numlinks, owner, owner_group, rawdev, | ||||
| space_used, system, time_access, time_backup, | ||||
| time_create, time_metadata, time_modify, | ||||
| mounted_on_fileid, dir_notif_delay, dirent_notif_delay, | ||||
| dacl, sacl, | ||||
| layout_type, layout_hint, layout_blksize, layout_alignment, | ||||
| mdsthreshold, retention_get, retention_set, retentevt_get, | ||||
| retentevt_set, retention_hold, mode_set_masked | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| For quota_avail_hard, quota_avail_soft, and quota_used, see their | ||||
| definitions below for the appropriate classification. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="rw_attr" numbered="true" toc="default"> | ||||
| <name>Set-Only and Get-Only Attributes</name> | ||||
| <t> | ||||
| Some <bcp14>REQUIRED</bcp14> and <bcp14>RECOMMENDED</bcp14> attributes are set-only; i.e., they | ||||
| can be set via SETATTR but not retrieved via GETATTR. Similarly, some | ||||
| <bcp14>REQUIRED</bcp14> and <bcp14>RECOMMENDED</bcp14> attributes are get-only; i.e., they | ||||
| can be retrieved via GETATTR but not set via SETATTR. If a client attempts | ||||
| to set a get-only attribute or get a set-only attributes, the server | ||||
| <bcp14>MUST</bcp14> return NFS4ERR_INVAL. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="mandatory_attributes" numbered="true" toc="default"> | ||||
| <name><bcp14>REQUIRED</bcp14> Attributes - List and Definition References</name> | ||||
| <t> | ||||
| The list of <bcp14>REQUIRED</bcp14> attributes appears in <xref target="req_attr_table" format="default"/>. | ||||
| The meaning of the columns of the table are: | ||||
| </t> | ||||
| <dl spacing="normal"> | ||||
| <dt>Name:</dt><dd>The name of the attribute.</dd> | ||||
| <dt>Id:</dt><dd>The number assigned to the attribute. In | ||||
| the event of conflicts between the assigned number and <xref target="RFC5662" format="default"/>, the latter is | ||||
| likely authoritative, but should be resolved with Errata to | ||||
| this document and/or | ||||
| <xref target="RFC5662" format="default"/>. See <xref target="errata" format="default"/> for the Errata process.</dd> | ||||
| <dt>Data Type:</dt><dd>The XDR data type of the attribute.</dd> | ||||
| <dt>Acc:</dt><dd>Access allowed to the attribute. R means | ||||
| read-only (GETATTR may retrieve, SETATTR may not | ||||
| set). W means write-only (SETATTR may set, GETATTR | ||||
| may not retrieve). R W means read/write (GETATTR | ||||
| may retrieve, SETATTR may set).</dd> | ||||
| <dt>Defined in:</dt><dd>The section of this specification that describes the | ||||
| attribute.</dd> | ||||
| </dl> | ||||
| <table anchor="req_attr_table" align="center"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Name</th> | ||||
| <th align="left">Id</th> | ||||
| <th align="left">Data Type</th> | ||||
| <th align="left">Acc</th> | ||||
| <th align="left">Defined in:</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">supported_attrs</td> | ||||
| <td align="left">0</td> | ||||
| <td align="left">bitmap4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_supp_attr" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">type</td> | ||||
| <td align="left">1</td> | ||||
| <td align="left">nfs_ftype4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_type" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">fh_expire_type</td> | ||||
| <td align="left">2</td> | ||||
| <td align="left">uint32_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_fh_expire_type" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">change</td> | ||||
| <td align="left">3</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_change" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">size</td> | ||||
| <td align="left">4</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_size" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">link_support</td> | ||||
| <td align="left">5</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_link_support" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">symlink_support</td> | ||||
| <td align="left">6</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_symlink_support" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">named_attr</td> | ||||
| <td align="left">7</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_named_attr" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">fsid</td> | ||||
| <td align="left">8</td> | ||||
| <td align="left">fsid4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_fsid" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">unique_handles</td> | ||||
| <td align="left">9</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_unique_handles" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">lease_time</td> | ||||
| <td align="left">10</td> | ||||
| <td align="left">nfs_lease4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_lease_time" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">rdattr_error</td> | ||||
| <td align="left">11</td> | ||||
| <td align="left">enum</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_rdattr_error" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">filehandle</td> | ||||
| <td align="left">19</td> | ||||
| <td align="left">nfs_fh4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_filehandle" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">suppattr_exclcreat</td> | ||||
| <td align="left">75</td> | ||||
| <td align="left">bitmap4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_suppattr_exclcreat" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| <section anchor="recommended_attributes" numbered="true" toc="default"> | ||||
| <name><bcp14>RECOMMENDED</bcp14> Attributes - List and Definition References</name> | ||||
| <t> | ||||
| The <bcp14>RECOMMENDED</bcp14> attributes are defined in | ||||
| <xref target="rec_attr_tbl" format="default"/>. The meanings | ||||
| of the column headers are the same as | ||||
| <xref target="req_attr_table" format="default"/>; see <xref target="mandatory_attributes" format="default"/> for the meanings. | ||||
| </t> | ||||
| <table anchor="rec_attr_tbl" align="center"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Name</th> | ||||
| <th align="left">Id</th> | ||||
| <th align="left">Data Type</th> | ||||
| <th align="left">Acc</th> | ||||
| <th align="left">Defined in:</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">acl</td> | ||||
| <td align="left">12</td> | ||||
| <td align="left">nfsace4<></td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_acl" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">aclsupport</td> | ||||
| <td align="left">13</td> | ||||
| <td align="left">uint32_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_aclsupport" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">archive</td> | ||||
| <td align="left">14</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_archive" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">cansettime</td> | ||||
| <td align="left">15</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_cansettime" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">case_insensitive</td> | ||||
| <td align="left">16</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_case_insensitive" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">case_preserving</td> | ||||
| <td align="left">17</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_case_preserving" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">change_policy</td> | ||||
| <td align="left">60</td> | ||||
| <td align="left">chg_policy4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_change_policy" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">chown_restricted</td> | ||||
| <td align="left">18</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_chown_restricted" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">dacl</td> | ||||
| <td align="left">58</td> | ||||
| <td align="left">nfsacl41</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_dacl" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">dir_notif_delay</td> | ||||
| <td align="left">56</td> | ||||
| <td align="left">nfstime4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_dir_notif_delay" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">dirent_notif_delay</td> | ||||
| <td align="left">57</td> | ||||
| <td align="left">nfstime4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_dirent_notif_delay" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">fileid</td> | ||||
| <td align="left">20</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_fileid" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">files_avail</td> | ||||
| <td align="left">21</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_files_avail" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">files_free</td> | ||||
| <td align="left">22</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_files_free" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">files_total</td> | ||||
| <td align="left">23</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_files_total" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">fs_charset_cap</td> | ||||
| <td align="left">76</td> | ||||
| <td align="left">uint32_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_fs_charset_cap" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">fs_layout_type</td> | ||||
| <td align="left">62</td> | ||||
| <td align="left">layouttype4<></td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_fs_layout_type" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">fs_locations</td> | ||||
| <td align="left">24</td> | ||||
| <td align="left">fs_locations</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_fs_locations" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">fs_locations_info</td> | ||||
| <td align="left">67</td> | ||||
| <td align="left">fs_locations_info4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_fs_locations_info" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">fs_status</td> | ||||
| <td align="left">61</td> | ||||
| <td align="left">fs4_status</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_fs_status" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">hidden</td> | ||||
| <td align="left">25</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_hidden" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">homogeneous</td> | ||||
| <td align="left">26</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_homogeneous" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">layout_alignment</td> | ||||
| <td align="left">66</td> | ||||
| <td align="left">uint32_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_layout_alignment" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">layout_blksize</td> | ||||
| <td align="left">65</td> | ||||
| <td align="left">uint32_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_layout_blksize" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">layout_hint</td> | ||||
| <td align="left">63</td> | ||||
| <td align="left">layouthint4</td> | ||||
| <td align="left">Â Â W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_layout_hint" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">layout_type</td> | ||||
| <td align="left">64</td> | ||||
| <td align="left">layouttype4<></td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_layout_type" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">maxfilesize</td> | ||||
| <td align="left">27</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_maxfilesize" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">maxlink</td> | ||||
| <td align="left">28</td> | ||||
| <td align="left">uint32_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_maxlink" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">maxname</td> | ||||
| <td align="left">29</td> | ||||
| <td align="left">uint32_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_maxname" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">maxread</td> | ||||
| <td align="left">30</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_maxread" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">maxwrite</td> | ||||
| <td align="left">31</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_maxwrite" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">mdsthreshold</td> | ||||
| <td align="left">68</td> | ||||
| <td align="left">mdsthreshold4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_mdsthreshold" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">mimetype</td> | ||||
| <td align="left">32</td> | ||||
| <td align="left">utf8str_cs</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_mimetype" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">mode</td> | ||||
| <td align="left">33</td> | ||||
| <td align="left">mode4</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_mode" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">mode_set_masked</td> | ||||
| <td align="left">74</td> | ||||
| <td align="left">mode_masked4</td> | ||||
| <td align="left">Â Â W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_mode_set_masked" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">mounted_on_fileid</td> | ||||
| <td align="left">55</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_mounted_on_fileid" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">no_trunc</td> | ||||
| <td align="left">34</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_no_trunc" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">numlinks</td> | ||||
| <td align="left">35</td> | ||||
| <td align="left">uint32_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_numlinks" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">owner</td> | ||||
| <td align="left">36</td> | ||||
| <td align="left">utf8str_mixed</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_owner" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">owner_group</td> | ||||
| <td align="left">37</td> | ||||
| <td align="left">utf8str_mixed</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_owner_group" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">quota_avail_hard</td> | ||||
| <td align="left">38</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_quota_avail_hard" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">quota_avail_soft</td> | ||||
| <td align="left">39</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_quota_avail_soft" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">quota_used</td> | ||||
| <td align="left">40</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_quota_used" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">rawdev</td> | ||||
| <td align="left">41</td> | ||||
| <td align="left">specdata4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_rawdev" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">retentevt_get</td> | ||||
| <td align="left">71</td> | ||||
| <td align="left">retention_get4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_retentevt_get" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">retentevt_set</td> | ||||
| <td align="left">72</td> | ||||
| <td align="left">retention_set4</td> | ||||
| <td align="left">Â Â W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_retentevt_set" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">retention_get</td> | ||||
| <td align="left">69</td> | ||||
| <td align="left">retention_get4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_retention_get" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">retention_hold</td> | ||||
| <td align="left">73</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_retention_hold" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">retention_set</td> | ||||
| <td align="left">70</td> | ||||
| <td align="left">retention_set4</td> | ||||
| <td align="left">Â Â W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_retention_set" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">sacl</td> | ||||
| <td align="left">59</td> | ||||
| <td align="left">nfsacl41</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_sacl" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">space_avail</td> | ||||
| <td align="left">42</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_space_avail" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">space_free</td> | ||||
| <td align="left">43</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_space_free" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">space_total</td> | ||||
| <td align="left">44</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_space_total" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">space_used</td> | ||||
| <td align="left">45</td> | ||||
| <td align="left">uint64_t</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_space_used" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">system</td> | ||||
| <td align="left">46</td> | ||||
| <td align="left">bool</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_system" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">time_access</td> | ||||
| <td align="left">47</td> | ||||
| <td align="left">nfstime4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_time_access" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">time_access_set</td> | ||||
| <td align="left">48</td> | ||||
| <td align="left">settime4</td> | ||||
| <td align="left">Â Â W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_time_access_set" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">time_backup</td> | ||||
| <td align="left">49</td> | ||||
| <td align="left">nfstime4</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_time_backup" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">time_create</td> | ||||
| <td align="left">50</td> | ||||
| <td align="left">nfstime4</td> | ||||
| <td align="left">R W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_time_create" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">time_delta</td> | ||||
| <td align="left">51</td> | ||||
| <td align="left">nfstime4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_time_delta" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">time_metadata</td> | ||||
| <td align="left">52</td> | ||||
| <td align="left">nfstime4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_time_metadata" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">time_modify</td> | ||||
| <td align="left">53</td> | ||||
| <td align="left">nfstime4</td> | ||||
| <td align="left">R</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_time_modify" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">time_modify_set</td> | ||||
| <td align="left">54</td> | ||||
| <td align="left">settime4</td> | ||||
| <td align="left">Â Â W</td> | ||||
| <td align="left"> | ||||
| <xref target="attrdef_time_modify_set" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| <section anchor="attribute_definitions" numbered="true" toc="default"> | ||||
| <name>Attribute Definitions</name> | ||||
| <section anchor="required_attr" numbered="true" toc="default"> | ||||
| <name>Definitions of <bcp14>REQUIRED</bcp14> Attributes</name> | ||||
| <section toc="exclude" anchor="attrdef_supp_attr" numbered="true"> | ||||
| <name>Attribute 0: supported_attrs</name> | ||||
| <t> | ||||
| The bit vector that would retrieve all <bcp14>REQUIRED</bcp14> and | ||||
| <bcp14>RECOMMENDED</bcp14> attributes that are supported for this object. | ||||
| The scope of this attribute applies to all objects with a | ||||
| matching fsid. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_type" numbered="true"> | ||||
| <name>Attribute 1: type</name> | ||||
| <t> | ||||
| Designates the type of an object in terms of one of a number | ||||
| of special constants: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| NF4REG designates a regular file. | ||||
| </li> | ||||
| <li> | ||||
| NF4DIR designates a directory. | ||||
| </li> | ||||
| <li> | ||||
| NF4BLK designates a block device special file. | ||||
| </li> | ||||
| <li> | ||||
| NF4CHR designates a character device special file. | ||||
| </li> | ||||
| <li> | ||||
| NF4LNK designates a symbolic link. | ||||
| </li> | ||||
| <li> | ||||
| NF4SOCK designates a named socket special file. | ||||
| </li> | ||||
| <li> | ||||
| NF4FIFO designates a fifo special file. | ||||
| </li> | ||||
| <li> | ||||
| NF4ATTRDIR designates a named attribute directory. | ||||
| </li> | ||||
| <li> | ||||
| NF4NAMEDATTR designates a named attribute. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Within the explanatory text and operation descriptions, the | ||||
| following phrases will be used with the meanings given below: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The phrase "is a directory" means that the object's | ||||
| type attribute is NF4DIR or NF4ATTRDIR. | ||||
| </li> | ||||
| <li> | ||||
| The phrase "is a special file" means that the object's type | ||||
| attribute is NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO. | ||||
| </li> | ||||
| <li> | ||||
| The phrases "is an ordinary file" and | ||||
| "is a regular file" mean that the object's | ||||
| type attribute is NF4REG or NF4NAMEDATTR. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_fh_expire_type" numbered="true"> | ||||
| <name>Attribute 2: fh_expire_type</name> | ||||
| <t> | ||||
| Server uses this to specify filehandle expiration behavior | ||||
| to the client. See <xref target="Filehandles" format="default"/> for additional | ||||
| description. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_change" numbered="true"> | ||||
| <name>Attribute 3: change</name> | ||||
| <t> | ||||
| A value created by the server that the client can use to | ||||
| determine if file data, directory contents, or attributes of | ||||
| the object have been modified. The server may return the | ||||
| object's time_metadata attribute for this attribute's value, | ||||
| but only if the file system object cannot be updated more | ||||
| frequently than the resolution of time_metadata. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_size" numbered="true"> | ||||
| <name>Attribute 4: size</name> | ||||
| <t> | ||||
| The size of the object in bytes. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_link_support" numbered="true"> | ||||
| <name>Attribute 5: link_support</name> | ||||
| <t> | ||||
| TRUE, if the object's file system supports hard links. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_symlink_support" numbered="true"> | ||||
| <name>Attribute 6: symlink_support</name> | ||||
| <t> | ||||
| TRUE, if the object's file system supports symbolic links. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_named_attr" numbered="true"> | ||||
| <name>Attribute 7: named_attr</name> | ||||
| <t> | ||||
| TRUE, if this object has named attributes. In other words, | ||||
| object has a non-empty named attribute directory. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_fsid" numbered="true"> | ||||
| <name>Attribute 8: fsid</name> | ||||
| <t> | ||||
| Unique file system identifier for the file system holding this | ||||
| object. The fsid attribute has major and minor components, each of | ||||
| which are of data type uint64_t. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_unique_handles" numbered="true"> | ||||
| <name>Attribute 9: unique_handles</name> | ||||
| <t> | ||||
| TRUE, if two distinct filehandles are guaranteed to refer to two | ||||
| different file system objects. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_lease_time" numbered="true"> | ||||
| <name>Attribute 10: lease_time</name> | ||||
| <t> | ||||
| Duration of the lease at server in seconds. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_rdattr_error" numbered="true"> | ||||
| <name>Attribute 11: rdattr_error</name> | ||||
| <t> | ||||
| Error returned from an attempt to retrieve attributes during a READDIR operation. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_filehandle" numbered="true"> | ||||
| <name>Attribute 19: filehandle</name> | ||||
| <t> | ||||
| The filehandle of this object (primarily for READDIR requests). | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_suppattr_exclcreat" numbered="true"> | ||||
| <name>Attribute 75: suppattr_exclcreat</name> | ||||
| <t> | ||||
| The bit vector that would set all <bcp14>REQUIRED</bcp14> and | ||||
| <bcp14>RECOMMENDED</bcp14> attributes that are supported by the EXCLUSIVE4_1 | ||||
| method of file creation via the OPEN operation. | ||||
| The scope of this attribute applies to all objects with a | ||||
| matching fsid. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="recommended_attr" numbered="true" toc="default"> | ||||
| <name>Definitions of Uncategorized <bcp14>RECOMMENDED</bcp14> Attributes</name> | ||||
| <t> | ||||
| The definitions of most of the <bcp14>RECOMMENDED</bcp14> attributes follow. Collections | ||||
| that share a common category are defined in other sections. | ||||
| </t> | ||||
| <section toc="exclude" anchor="attrdef_archive" numbered="true"> | ||||
| <name>Attribute 14: archive</name> | ||||
| <t> | ||||
| TRUE, if this file has been archived since the time of last | ||||
| modification (deprecated in favor of time_backup). | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_cansettime" numbered="true"> | ||||
| <name>Attribute 15: cansettime</name> | ||||
| <t> | ||||
| TRUE, if the server is able to change the times for a | ||||
| file system object as specified in a SETATTR operation. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_case_insensitive" numbered="true"> | ||||
| <name>Attribute 16: case_insensitive</name> | ||||
| <t> | ||||
| TRUE, if file name comparisons on this file system are case | ||||
| insensitive. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_case_preserving" numbered="true"> | ||||
| <name>Attribute 17: case_preserving</name> | ||||
| <t> | ||||
| TRUE, if file name case on this file system is preserved. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_change_policy" numbered="true"> | ||||
| <name>Attribute 60: change_policy</name> | ||||
| <t> | ||||
| A value created by the server that the client can use to | ||||
| determine if some server policy related to the current | ||||
| file system has been subject to change. If the value | ||||
| remains the same, then the client can be sure that the | ||||
| values of the attributes related to fs location | ||||
| and the fss_type field of the fs_status attribute have | ||||
| not changed. On the other hand, a change in this value does | ||||
| necessarily imply a change in policy. It is up to the client | ||||
| to interrogate the server to determine if some policy relevant to | ||||
| it has changed. See <xref target="chg_policy4" format="default"/> for | ||||
| details. | ||||
| </t> | ||||
| <t> | ||||
| This attribute <bcp14>MUST</bcp14> change when the value returned by | ||||
| the fs_locations or fs_locations_info attribute changes, when | ||||
| a file system goes from read-only to writable or vice versa, | ||||
| or when the allowable set of security flavors for the file system | ||||
| or any part thereof is changed. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_chown_restricted" numbered="true"> | ||||
| <name>Attribute 18: chown_restricted</name> | ||||
| <t> | ||||
| If TRUE, the server will reject any request to change either | ||||
| the owner or the group associated with a file if the caller | ||||
| is not a privileged user (for example, "root" in UNIX | ||||
| operating environments or, in Windows 2000, the "Take | ||||
| Ownership" privilege). | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_fileid" numbered="true"> | ||||
| <name>Attribute 20: fileid</name> | ||||
| <t> | ||||
| A number uniquely identifying the file within the file system. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_files_avail" numbered="true"> | ||||
| <name>Attribute 21: files_avail</name> | ||||
| <t> | ||||
| File slots available to this user on the file system | ||||
| containing this object -- this should be the smallest | ||||
| relevant limit. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_files_free" numbered="true"> | ||||
| <name>Attribute 22: files_free</name> | ||||
| <t> | ||||
| Free file slots on the file system containing this object -- | ||||
| this should be the smallest relevant limit. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_files_total" numbered="true"> | ||||
| <name>Attribute 23: files_total</name> | ||||
| <t> | ||||
| Total file slots on the file system containing this object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_fs_charset_cap" numbered="true"> | ||||
| <name>Attribute 76: fs_charset_cap</name> | ||||
| <t> | ||||
| Character set capabilities for this file system. See | ||||
| <xref target="utf8_caps" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_fs_locations" numbered="true"> | ||||
| <name>Attribute 24: fs_locations</name> | ||||
| <t> | ||||
| Locations where this file system may be found. If the server | ||||
| returns NFS4ERR_MOVED as an error, this attribute <bcp14>MUST</bcp14> be | ||||
| supported. | ||||
| See <xref target="fs_locations" format="default"/> for more details. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_fs_locations_info" numbered="true"> | ||||
| <name>Attribute 67: fs_locations_info</name> | ||||
| <t> | ||||
| Full function file system location. | ||||
| See <xref target="SEC11-fsli-info" format="default"/> for more details. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_fs_status" numbered="true"> | ||||
| <name>Attribute 61: fs_status</name> | ||||
| <t> | ||||
| Generic file system type information. | ||||
| See <xref target="fs_status" format="default"/> for more details. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_hidden" numbered="true"> | ||||
| <name>Attribute 25: hidden</name> | ||||
| <t> | ||||
| TRUE, if the file is considered hidden with respect to | ||||
| the Windows API. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_homogeneous" numbered="true"> | ||||
| <name>Attribute 26: homogeneous</name> | ||||
| <t> | ||||
| TRUE, if this object's file system is homogeneous; i.e., all | ||||
| objects in the file system (all objects on the server with the | ||||
| same fsid) have common values for all per-file-system attributes. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_maxfilesize" numbered="true"> | ||||
| <name>Attribute 27: maxfilesize</name> | ||||
| <t> | ||||
| Maximum supported file size for the file system of this object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_maxlink" numbered="true"> | ||||
| <name>Attribute 28: maxlink</name> | ||||
| <t> | ||||
| Maximum number of links for this object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_maxname" numbered="true"> | ||||
| <name>Attribute 29: maxname</name> | ||||
| <t> | ||||
| Maximum file name size supported for this object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_maxread" numbered="true"> | ||||
| <name>Attribute 30: maxread</name> | ||||
| <t> | ||||
| Maximum amount of data the READ operation will return for this object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_maxwrite" numbered="true"> | ||||
| <name>Attribute 31: maxwrite</name> | ||||
| <t> | ||||
| Maximum amount of data the WRITE operation will accept for this object. | ||||
| This | ||||
| attribute <bcp14>SHOULD</bcp14> be supported if the file is writable. Lack | ||||
| of this attribute can lead to the client either wasting | ||||
| bandwidth or not receiving the best performance. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_mimetype" numbered="true"> | ||||
| <name>Attribute 32: mimetype</name> | ||||
| <t> | ||||
| MIME body type/subtype of this object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_mounted_on_fileid" numbered="true"> | ||||
| <name>Attribute 55: mounted_on_fileid</name> | ||||
| <t> | ||||
| Like fileid, but if the target filehandle is the root of a | ||||
| file system, this attribute represents the fileid of the | ||||
| underlying directory. | ||||
| </t> | ||||
| <t> | ||||
| UNIX-based operating environments connect a file system into | ||||
| the namespace by connecting (mounting) the file system onto | ||||
| the existing file object (the mount point, usually a | ||||
| directory) of an existing file system. When the mount point's | ||||
| parent directory is read via an API like readdir(), the return | ||||
| results are directory entries, each with a component name and | ||||
| a fileid. The fileid of the mount point's directory entry will | ||||
| be different from the fileid that the stat() system call | ||||
| returns. The stat() system call is returning the fileid of the | ||||
| root of the mounted file system, whereas readdir() is | ||||
| returning the fileid that stat() would have returned before any | ||||
| file systems were mounted on the mount point. | ||||
| </t> | ||||
| <t> | ||||
| Unlike NFSv3, NFSv4.1 allows a client's LOOKUP | ||||
| request to cross other file systems. The client detects the | ||||
| file system crossing whenever the filehandle argument of | ||||
| LOOKUP has an fsid attribute different from that of the | ||||
| filehandle returned by LOOKUP. A UNIX-based client will | ||||
| consider this a "mount point crossing". UNIX has a legacy | ||||
| scheme for allowing a process to determine its current working | ||||
| directory. This relies on readdir() of a mount point's parent | ||||
| and stat() of the mount point returning fileids as previously | ||||
| described. The mounted_on_fileid attribute corresponds to the | ||||
| fileid that readdir() would have returned as described | ||||
| previously. | ||||
| </t> | ||||
| <t> | ||||
| While the NFSv4.1 client could simply fabricate a fileid | ||||
| corresponding to what mounted_on_fileid provides (and if the | ||||
| server does not support mounted_on_fileid, the client has no | ||||
| choice), there is a risk that the client will generate a | ||||
| fileid that conflicts with one that is already assigned to | ||||
| another object in the file system. Instead, if the server can | ||||
| provide the mounted_on_fileid, the potential for client | ||||
| operational problems in this area is eliminated. | ||||
| </t> | ||||
| <t> | ||||
| If the server detects that there is no mounted point at the | ||||
| target file object, then the value for mounted_on_fileid that | ||||
| it returns is the same as that of the fileid attribute. | ||||
| </t> | ||||
| <t> | ||||
| The mounted_on_fileid attribute is <bcp14>RECOMMENDED</bcp14>, so the server | ||||
| <bcp14>SHOULD</bcp14> provide it if possible, and for a UNIX-based server, | ||||
| this is straightforward. Usually, mounted_on_fileid will be | ||||
| requested during a READDIR operation, in which case it is | ||||
| trivial (at least for UNIX-based servers) to return | ||||
| mounted_on_fileid since it is equal to the fileid of a | ||||
| directory entry returned by readdir(). If mounted_on_fileid | ||||
| is requested in a GETATTR operation, the server should obey an | ||||
| invariant that has it returning a value that is equal to the | ||||
| file object's entry in the object's parent directory, | ||||
| i.e., what readdir() would have returned. Some operating | ||||
| environments allow a series of two or more file systems to be | ||||
| mounted onto a single mount point. In this case, for the | ||||
| server to obey the aforementioned invariant, it will need to | ||||
| find the base mount point, and not the intermediate mount | ||||
| points. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_no_trunc" numbered="true"> | ||||
| <name>Attribute 34: no_trunc</name> | ||||
| <t> | ||||
| If this attribute is TRUE, then if the client uses a file | ||||
| name longer than name_max, an error will be | ||||
| returned instead of the name being truncated. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_numlinks" numbered="true"> | ||||
| <name>Attribute 35: numlinks</name> | ||||
| <t> | ||||
| Number of hard links to this object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_owner" numbered="true"> | ||||
| <name>Attribute 36: owner</name> | ||||
| <t> | ||||
| The string name of the owner of this object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_owner_group" numbered="true"> | ||||
| <name>Attribute 37: owner_group</name> | ||||
| <t> | ||||
| The string name of the group ownership of this object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_quota_avail_hard" numbered="true"> | ||||
| <name>Attribute 38: quota_avail_hard</name> | ||||
| <t anchor="quota_avail_hard"> | ||||
| The value in bytes that represents the amount of additional | ||||
| disk space beyond the current allocation that can be allocated | ||||
| to this file or directory before further allocations will be | ||||
| refused. It is understood that this space may be consumed by | ||||
| allocations to other files or directories. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_quota_avail_soft" numbered="true"> | ||||
| <name>Attribute 39: quota_avail_soft</name> | ||||
| <t anchor="quota_avail_soft"> | ||||
| The value in bytes that represents the amount of additional | ||||
| disk space that can be allocated to this file or directory | ||||
| before the user may reasonably be warned. It is understood | ||||
| that this space may be consumed by allocations to other files | ||||
| or directories though there is a rule as to which other files | ||||
| or directories. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_quota_used" numbered="true"> | ||||
| <name>Attribute 40: quota_used</name> | ||||
| <t anchor="quota_used"> | ||||
| The value in bytes that represents the amount of disk | ||||
| space used by this file or directory and possibly a | ||||
| number of other similar files or directories, where the | ||||
| set of "similar" meets at least the criterion that | ||||
| allocating space to any file or directory in the set | ||||
| will reduce the "quota_avail_hard" of every other file | ||||
| or directory in the set. | ||||
| </t> | ||||
| <t> | ||||
| Note that there may be a number of distinct but | ||||
| overlapping sets of files or directories for which a | ||||
| quota_used value is maintained, e.g., "all files with a | ||||
| given owner", "all files with a given group owner", etc. | ||||
| The server is at liberty to choose any of those sets when | ||||
| providing the content of the quota_used attribute, but | ||||
| should do so in a repeatable way. The rule may be | ||||
| configured per file system or may be "choose the set with | ||||
| the smallest quota". | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_rawdev" numbered="true"> | ||||
| <name>Attribute 41: rawdev</name> | ||||
| <t> | ||||
| Raw device number of file of type NF4BLK or NF4CHR. The device | ||||
| number is split into major and minor numbers. | ||||
| If the file's type attribute is not NF4BLK or NF4CHR, | ||||
| the value returned <bcp14>SHOULD NOT</bcp14> be considered useful. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_space_avail" numbered="true"> | ||||
| <name>Attribute 42: space_avail</name> | ||||
| <t> | ||||
| Disk space in bytes available to this user on the file system | ||||
| containing this object -- this should be the smallest | ||||
| relevant limit. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_space_free" numbered="true"> | ||||
| <name>Attribute 43: space_free</name> | ||||
| <t> | ||||
| Free disk space in bytes on the file system containing this | ||||
| object -- this should be the smallest relevant limit. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_space_total" numbered="true"> | ||||
| <name>Attribute 44: space_total</name> | ||||
| <t> | ||||
| Total disk space in bytes on the file system containing this object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_space_used" numbered="true"> | ||||
| <name>Attribute 45: space_used</name> | ||||
| <t> | ||||
| Number of file system bytes allocated to this object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_system" numbered="true"> | ||||
| <name>Attribute 46: system</name> | ||||
| <t> | ||||
| This attribute is TRUE if this file is a "system" file with | ||||
| respect to the Windows operating environment. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_time_access" numbered="true"> | ||||
| <name>Attribute 47: time_access</name> | ||||
| <t> | ||||
| The time_access attribute represents the time of last access to | ||||
| the object by a READ operation sent to the server. The notion | ||||
| of what is an "access" depends on the server's operating environment | ||||
| and/or the server's file system semantics. For example, for | ||||
| servers obeying Portable Operating System Interface (POSIX) semantics, time_access would be updated only | ||||
| by the READ and READDIR operations and not any of the operations | ||||
| that modify the content of the object <xref target="read_atime" format="default"/>, | ||||
| <xref target="readdir_atime" format="default"/>, <xref target="write_atime" format="default"/>. Of | ||||
| course, setting the corresponding time_access_set attribute is | ||||
| another way to modify the time_access attribute. | ||||
| </t> | ||||
| <t> | ||||
| Whenever the file object resides on a writable file system, | ||||
| the server should make its best efforts to record time_access into | ||||
| stable storage. However, to mitigate the performance effects | ||||
| of doing so, and most especially whenever the server is | ||||
| satisfying the read of the object's content from its cache, | ||||
| the server <bcp14>MAY</bcp14> cache access time updates and lazily write them | ||||
| to stable storage. It is also acceptable to give | ||||
| administrators of the server the option to disable time_access | ||||
| updates. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_time_access_set" numbered="true"> | ||||
| <name>Attribute 48: time_access_set</name> | ||||
| <t> | ||||
| Sets the time of last access to the object. SETATTR use only. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_time_backup" numbered="true"> | ||||
| <name>Attribute 49: time_backup</name> | ||||
| <t> | ||||
| The time of last backup of the object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_time_create" numbered="true"> | ||||
| <name>Attribute 50: time_create</name> | ||||
| <t> | ||||
| The time of creation of the object. This attribute does not | ||||
| have any relation to the traditional UNIX file attribute | ||||
| "ctime" or "change time". | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_time_delta" numbered="true"> | ||||
| <name>Attribute 51: time_delta</name> | ||||
| <t> | ||||
| Smallest useful server time granularity. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_time_metadata" numbered="true"> | ||||
| <name>Attribute 52: time_metadata</name> | ||||
| <t> | ||||
| The time of last metadata modification of the object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_time_modify" numbered="true"> | ||||
| <name>Attribute 53: time_modify</name> | ||||
| <t> | ||||
| The time of last modification to the object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_time_modify_set" numbered="true"> | ||||
| <name>Attribute 54: time_modify_set</name> | ||||
| <t> | ||||
| Sets the time of last modification to the object. SETATTR use only. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="owner_owner_group" numbered="true" toc="default"> | ||||
| <name>Interpreting owner and owner_group</name> | ||||
| <t> | ||||
| The <bcp14>RECOMMENDED</bcp14> attributes "owner" and "owner_group" (and also | ||||
| users and groups within the "acl" attribute) are represented in | ||||
| terms of a UTF-8 string. To avoid a representation that is tied | ||||
| to a particular underlying implementation at the client or | ||||
| server, the use of the UTF-8 string has been chosen. Note that | ||||
| Section <xref target="RFC2624" sectionFormat="bare" section="6.1"/> | ||||
| of RFC 2624 <xref target="RFC2624" format="default"/> provides | ||||
| additional rationale. It is expected that the client and server | ||||
| will have their own local representation of owner and | ||||
| owner_group that is used for local storage or presentation to | ||||
| the end user. Therefore, it is expected that when these | ||||
| attributes are transferred between the client and server, | ||||
| the local representation is translated to a syntax of the form | ||||
| "user@dns_domain". This will allow for a client and server that | ||||
| do not use the same local representation the ability to | ||||
| translate to a common syntax that can be interpreted by both. | ||||
| </t> | ||||
| <t> | ||||
| Similarly, security principals may be represented in different | ||||
| ways by different security mechanisms. Servers normally | ||||
| translate these representations into a common format, | ||||
| generally that used by local storage, to serve as a means of | ||||
| identifying the users corresponding to these security | ||||
| principals. When these local identifiers are translated to | ||||
| the form of the owner attribute, associated with files created | ||||
| by such principals, they identify, in a common format, the | ||||
| users associated with each corresponding set of security | ||||
| principals. | ||||
| </t> | ||||
| <t> | ||||
| The translation used to interpret owner and group strings is | ||||
| not specified as part of the protocol. This allows various | ||||
| solutions to be employed. For example, a local translation | ||||
| table may be consulted that maps a numeric identifier to the | ||||
| user@dns_domain syntax. A name service may also be used to | ||||
| accomplish the translation. A server may provide a more | ||||
| general service, not limited by any particular translation | ||||
| (which would only translate a limited set of possible strings) | ||||
| by storing the owner and owner_group attributes in local | ||||
| storage without any translation or it may augment a | ||||
| translation method by storing the entire string for attributes | ||||
| for which no translation is available while using the local | ||||
| representation for those cases in which a translation is | ||||
| available. | ||||
| </t> | ||||
| <t> | ||||
| Servers that do not provide support for all possible values of | ||||
| the owner and owner_group attributes <bcp14>SHOULD</bcp14> return an error | ||||
| (NFS4ERR_BADOWNER) when a string is presented that has no | ||||
| translation, as the value to be set for a SETATTR of the | ||||
| owner, owner_group, or acl attributes. When a server does | ||||
| accept an owner or owner_group value as valid on a SETATTR | ||||
| (and similarly for the owner and group strings in an acl), it | ||||
| is promising to return that same string when a corresponding | ||||
| GETATTR is done. Configuration changes (including | ||||
| changes from the mapping of the string to the local representation) | ||||
| and ill-constructed | ||||
| name translations (those that contain aliasing) may make that | ||||
| promise impossible to honor. Servers should make appropriate | ||||
| efforts to avoid a situation in which these attributes have | ||||
| their values changed when no real change to ownership has | ||||
| occurred. | ||||
| </t> | ||||
| <t> | ||||
| The "dns_domain" portion of the owner string is meant to be a | ||||
| DNS domain name, for example, user@example.org. Servers should | ||||
| accept as valid a set of users for at least one domain. A | ||||
| server may treat other domains as having no valid | ||||
| translations. A more general service is provided when a | ||||
| server is capable of accepting users for multiple domains, or | ||||
| for all domains, subject to security constraints. | ||||
| </t> | ||||
| <t> | ||||
| In the case where there is no translation available to the | ||||
| client or server, the attribute value will be constructed | ||||
| without the "@". Therefore, the absence of the @ from the | ||||
| owner or owner_group attribute signifies that no translation | ||||
| was available at the sender and that the receiver of the | ||||
| attribute should not use that string as a basis for | ||||
| translation into its own internal format. Even though the | ||||
| attribute value cannot be translated, it may still be useful. | ||||
| In the case of a client, the attribute string may be used for | ||||
| local display of ownership. | ||||
| </t> | ||||
| <t> | ||||
| To provide a greater degree of compatibility with NFSv3, | ||||
| which identified users and groups by 32-bit unsigned user | ||||
| identifiers and group identifiers, owner and group strings that | ||||
| consist of decimal numeric values with no leading zeros can be | ||||
| given a special interpretation by clients and servers that | ||||
| choose to provide such support. The receiver may treat such a | ||||
| user or group string as representing the same user as would be | ||||
| represented by an NFSv3 uid or gid having the corresponding | ||||
| numeric value. A server is not obligated to accept such a | ||||
| string, but may return an NFS4ERR_BADOWNER instead. To avoid | ||||
| this mechanism being used to subvert user and group translation, | ||||
| so that a client might pass all of the owners and groups in | ||||
| numeric form, a server <bcp14>SHOULD</bcp14> return an NFS4ERR_BADOWNER error | ||||
| when there is a valid translation for the user or owner | ||||
| designated in this way. In that case, the client must use the | ||||
| appropriate name@domain string and not the special form for compatibility. | ||||
| </t> | ||||
| <t> | ||||
| The owner string "nobody" may be used to designate an | ||||
| anonymous user, which will be associated with a file created | ||||
| by a security principal that cannot be mapped through normal | ||||
| means to the owner attribute. Users and implementations | ||||
| of NFSv4.1 <bcp14>SHOULD NOT</bcp14> use "nobody" to designate a real user whose access is not anonymous. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="character_case_attributes" numbered="true" toc="default"> | ||||
| <name>Character Case Attributes</name> | ||||
| <t> | ||||
| With respect to the case_insensitive and case_preserving | ||||
| attributes, each UCS-4 character (which UTF-8 encodes) can be | ||||
| mapped according to Appendix | ||||
| <xref target="RFC3454" sectionFormat="bare" section="B.2"/> | ||||
| of RFC 3454 <xref target="RFC3454" format="default"/>. | ||||
| For general character handling and internationalization issues, | ||||
| see <xref target="internationalization" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="dir_not_attrs" numbered="true" toc="default"> | ||||
| <name>Directory Notification Attributes</name> | ||||
| <t> | ||||
| As described in <xref target="OP_GET_DIR_DELEGATION" format="default"/>, the | ||||
| client can request a minimum delay for notifications of changes | ||||
| to attributes, but the server is free to ignore what the client | ||||
| requests. The client can determine in advance what notification | ||||
| delays the server will accept by sending a GETATTR operation for either or | ||||
| both of two directory notification attributes. When the client | ||||
| calls the GET_DIR_DELEGATION operation and asks for attribute | ||||
| change notifications, it should request notification delays that | ||||
| are no less than the values in the server-provided attributes. | ||||
| </t> | ||||
| <section toc="exclude" anchor="attrdef_dir_notif_delay" numbered="true"> | ||||
| <name>Attribute 56: dir_notif_delay</name> | ||||
| <t> | ||||
| The dir_notif_delay attribute is the minimum number of seconds | ||||
| the server will delay before notifying the client of a change | ||||
| to the directory's attributes. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_dirent_notif_delay" numbered="true"> | ||||
| <name>Attribute 57: dirent_notif_delay</name> | ||||
| <t> | ||||
| The dirent_notif_delay attribute is the minimum number of seconds | ||||
| the server will delay before notifying the client of a change | ||||
| to a file object that has an entry in the directory. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="pnfs_attr_full" numbered="true" toc="default"> | ||||
| <name>pNFS Attribute Definitions</name> | ||||
| <section toc="exclude" anchor="attrdef_fs_layout_type" numbered="true"> | ||||
| <name>Attribute 62: fs_layout_type</name> | ||||
| <t> | ||||
| The fs_layout_type attribute (see | ||||
| <xref target="layouttype4" format="default"/>) applies to a | ||||
| file system and indicates what layout types are supported by | ||||
| the file system. When the client encounters a new fsid, the | ||||
| client <bcp14>SHOULD</bcp14> obtain the value for the fs_layout_type | ||||
| attribute associated with the new file system. This attribute | ||||
| is used by the client to determine if the layout types | ||||
| supported by the server match any of the client's supported | ||||
| layout types. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_layout_alignment" numbered="true"> | ||||
| <name>Attribute 66: layout_alignment</name> | ||||
| <t> | ||||
| When a client holds layouts on files of a file system, the | ||||
| layout_alignment attribute indicates the preferred alignment | ||||
| for I/O to files on that file system. Where possible, the | ||||
| client should send READ and WRITE operations with offsets | ||||
| that are whole multiples of the layout_alignment attribute. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_layout_blksize" numbered="true"> | ||||
| <name>Attribute 65: layout_blksize</name> | ||||
| <t> | ||||
| When a client holds layouts on files of a file system, the | ||||
| layout_blksize attribute indicates the preferred block size | ||||
| for I/O to files on that file system. Where possible, the | ||||
| client should send READ operations with a count argument that | ||||
| is a whole multiple of layout_blksize, and WRITE operations | ||||
| with a data argument of size that is a whole multiple of | ||||
| layout_blksize. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_layout_hint" numbered="true"> | ||||
| <name>Attribute 63: layout_hint</name> | ||||
| <t> | ||||
| The layout_hint attribute (see | ||||
| <xref target="layouthint4" format="default"/>) may be set on | ||||
| newly created files to influence the metadata server's choice | ||||
| for the file's layout. If possible, this attribute is one of | ||||
| those set in the initial attributes within the OPEN operation. | ||||
| The metadata server may choose to ignore this attribute. The | ||||
| layout_hint attribute is a subset of the layout structure | ||||
| returned by LAYOUTGET. For example, instead of specifying | ||||
| particular devices, this would be used to suggest the stripe | ||||
| width of a file. The server implementation determines which | ||||
| fields within the layout will be used. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_layout_type" numbered="true"> | ||||
| <name>Attribute 64: layout_type</name> | ||||
| <t> | ||||
| This attribute lists the layout type(s) available for a file. | ||||
| The value returned by the server is for informational purposes | ||||
| only. The client will use the LAYOUTGET operation to obtain | ||||
| the information needed in order to perform I/O, for example, | ||||
| the specific device information for the file and its layout. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_mdsthreshold" numbered="true"> | ||||
| <name>Attribute 68: mdsthreshold</name> | ||||
| <t> | ||||
| This attribute is a server-provided hint used to communicate | ||||
| to the client when it is more efficient to send READ and | ||||
| WRITE operations to the metadata server or the data server. | ||||
| The two types of thresholds described are file size thresholds | ||||
| and I/O size thresholds. If a file's size is smaller than the | ||||
| file size threshold, data accesses <bcp14>SHOULD</bcp14> be sent to the | ||||
| metadata server. If an I/O request has a length | ||||
| that is below the I/O size threshold, | ||||
| the I/O <bcp14>SHOULD</bcp14> be sent to the metadata server. | ||||
| Each threshold type is specified separately for read and | ||||
| write. | ||||
| </t> | ||||
| <t> | ||||
| The server <bcp14>MAY</bcp14> provide both types of thresholds for a file. | ||||
| If both file size and I/O size are provided, the client <bcp14>SHOULD</bcp14> | ||||
| reach or exceed both thresholds before sending its read or write | ||||
| requests to the data server. Alternatively, if only one of | ||||
| the specified thresholds is reached or exceeded, the I/O requests are | ||||
| sent to the metadata server. | ||||
| </t> | ||||
| <t> | ||||
| For each threshold type, a value of zero indicates no READ or WRITE | ||||
| should be sent to the metadata server, while a value of all ones | ||||
| indicates that all READs or WRITEs should be sent to the metadata | ||||
| server. | ||||
| </t> | ||||
| <t> | ||||
| The attribute is available on a per-filehandle basis. If the | ||||
| current filehandle refers to a non-pNFS file or directory, the | ||||
| metadata server should return an attribute that is | ||||
| representative of the filehandle's file system. It is suggested | ||||
| that this attribute is queried as part of the OPEN operation. | ||||
| Due to dynamic system changes, the client should not assume that | ||||
| the attribute will remain constant for any specific time period; | ||||
| thus, it should be periodically refreshed. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] "PNFS Attributes" --> | ||||
| <section anchor="retention" numbered="true" toc="default"> | ||||
| <name>Retention Attributes</name> | ||||
| <t> | ||||
| Retention is a concept whereby a file object can be placed in an | ||||
| immutable, undeletable, unrenamable state for a fixed or | ||||
| infinite duration of time. Once in this "retained" state, the | ||||
| file cannot be moved out of the state until the duration of | ||||
| retention has been reached. | ||||
| </t> | ||||
| <t> | ||||
| When retention is enabled, retention <bcp14>MUST</bcp14> extend to the data of | ||||
| the file, and the name of file. The server <bcp14>MAY</bcp14> extend retention | ||||
| to any other property of the file, including any subset of | ||||
| <bcp14>REQUIRED</bcp14>, <bcp14>RECOMMENDED</bcp14>, and named attributes, with the | ||||
| exceptions noted in this section. | ||||
| </t> | ||||
| <t> | ||||
| Servers <bcp14>MAY</bcp14> support or not support retention on | ||||
| any file object type. | ||||
| </t> | ||||
| <t> | ||||
| The five retention attributes are explained in the next subsections. | ||||
| </t> | ||||
| <section toc="exclude" anchor="attrdef_retention_get" numbered="true"> | ||||
| <name>Attribute 69: retention_get</name> | ||||
| <t> | ||||
| If retention is enabled for the associated file, | ||||
| this attribute's value represents the retention | ||||
| begin time of the file object. This attribute's | ||||
| value is only readable with the GETATTR operation | ||||
| and <bcp14>MUST NOT</bcp14> be modified by the SETATTR operation | ||||
| (<xref target="rw_attr" format="default"/>). The value of the | ||||
| attribute consists of: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const RET4_DURATION_INFINITE = 0xffffffffffffffff; | ||||
| struct retention_get4 { | ||||
| uint64_t rg_duration; | ||||
| nfstime4 rg_begin_time<1>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The field rg_duration is the duration in seconds indicating how | ||||
| long the file will be retained once retention is enabled. The | ||||
| field rg_begin_time is an array of up to one absolute time | ||||
| value. If the array is zero length, no beginning retention time | ||||
| has been established, and retention is not enabled. | ||||
| If rg_duration is equal to RET4_DURATION_INFINITE, the file, once | ||||
| retention is enabled, will be retained for an infinite duration. | ||||
| </t> | ||||
| <t> | ||||
| If (as soon as) rg_duration is zero, then rg_begin_time will be | ||||
| of zero length, and again, retention is not (no longer) enabled. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_retention_set" numbered="true"> | ||||
| <name>Attribute 70: retention_set</name> | ||||
| <t> | ||||
| This attribute is used to set the retention | ||||
| duration and optionally enable retention for | ||||
| the associated file object. This attribute is | ||||
| only modifiable via the SETATTR operation and | ||||
| <bcp14>MUST NOT</bcp14> be retrieved by the GETATTR operation | ||||
| (<xref target="rw_attr" format="default"/>). | ||||
| This attribute corresponds to retention_get. | ||||
| The value of the attribute consists of: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct retention_set4 { | ||||
| bool rs_enable; | ||||
| uint64_t rs_duration<1>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| If the client sets rs_enable to TRUE, then it is enabling | ||||
| retention on the file object with the begin time of retention | ||||
| starting from the server's current time and date. The | ||||
| duration of the retention can also be provided if the | ||||
| rs_duration array is of length one. The duration is the time in | ||||
| seconds from the begin time of retention, and if set to | ||||
| RET4_DURATION_INFINITE, the file is to be retained forever. If | ||||
| retention is enabled, with no duration specified in either | ||||
| this SETATTR or a previous SETATTR, the duration defaults to | ||||
| zero seconds. The server <bcp14>MAY</bcp14> restrict the enabling of | ||||
| retention or the duration of retention on the basis of the | ||||
| ACE4_WRITE_RETENTION ACL permission. The enabling of | ||||
| retention <bcp14>MUST NOT</bcp14> prevent the enabling of event-based | ||||
| retention or the modification of the retention_hold | ||||
| attribute. | ||||
| </t> | ||||
| <t> | ||||
| The following rules apply to both the retention_set and | ||||
| retentevt_set attributes. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| As long as retention is not enabled, the client | ||||
| is permitted to decrease the duration. | ||||
| </li> | ||||
| <li> | ||||
| The duration can always be set to an | ||||
| equal or higher value, even if retention is | ||||
| enabled. Note that once retention is enabled, | ||||
| the actual duration (as returned by the | ||||
| retention_get or retentevt_get attributes; | ||||
| see <xref target="attrdef_retention_get" format="default"/> | ||||
| or <xref target="attrdef_retentevt_get" format="default"/>) | ||||
| is constantly counting down to zero (one unit | ||||
| per second), unless the duration was set to | ||||
| RET4_DURATION_INFINITE. Thus, it will not be | ||||
| possible for the client to precisely extend the | ||||
| duration on a file that has retention enabled. | ||||
| </li> | ||||
| <li> | ||||
| While retention is enabled, attempts to disable | ||||
| retention or decrease the retention's duration | ||||
| <bcp14>MUST</bcp14> fail with the error NFS4ERR_INVAL. | ||||
| </li> | ||||
| <li> | ||||
| If the principal attempting to change | ||||
| retention_set or retentevt_set does not have | ||||
| ACE4_WRITE_RETENTION permissions, the attempt | ||||
| <bcp14>MUST</bcp14> fail with NFS4ERR_ACCESS. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_retentevt_get" numbered="true"> | ||||
| <name>Attribute 71: retentevt_get</name> | ||||
| <t> | ||||
| Gets the event-based retention duration, and if enabled, the | ||||
| event-based retention begin time of the file object. This | ||||
| attribute is like retention_get, but refers to event-based | ||||
| retention. The event that triggers event-based retention is | ||||
| not defined by the NFSv4.1 specification. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_retentevt_set" numbered="true"> | ||||
| <name>Attribute 72: retentevt_set</name> | ||||
| <t> | ||||
| Sets the event-based retention duration, and optionally enables | ||||
| event-based retention on the file object. This attribute | ||||
| corresponds to retentevt_get and is like retention_set, but | ||||
| refers to event-based retention. When event-based retention | ||||
| is set, the file <bcp14>MUST</bcp14> be retained even if non-event-based | ||||
| retention has been set, and the duration of non-event-based | ||||
| retention has been reached. Conversely, when non-event-based | ||||
| retention has been set, the file <bcp14>MUST</bcp14> be retained even if | ||||
| event-based retention has been set, and the duration of | ||||
| event-based retention has been reached. The server <bcp14>MAY</bcp14> | ||||
| restrict the enabling of event-based retention or the duration | ||||
| of event-based retention on the basis of the | ||||
| ACE4_WRITE_RETENTION ACL permission. The enabling of | ||||
| event-based retention <bcp14>MUST NOT</bcp14> prevent the enabling of | ||||
| non-event-based retention or the modification of the | ||||
| retention_hold attribute. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="attrdef_retention_hold" numbered="true"> | ||||
| <name>Attribute 73: retention_hold</name> | ||||
| <t> | ||||
| Gets or sets administrative retention holds, one hold per bit | ||||
| position. | ||||
| </t> | ||||
| <t> | ||||
| This attribute allows one to 64 administrative holds, one hold | ||||
| per bit on the attribute. If retention_hold is not zero, then | ||||
| the file <bcp14>MUST NOT</bcp14> be deleted, renamed, or modified, even if | ||||
| the duration on enabled event or non-event-based retention has | ||||
| been reached. The server <bcp14>MAY</bcp14> restrict the modification of | ||||
| retention_hold on the basis of the ACE4_WRITE_RETENTION_HOLD | ||||
| ACL permission. The enabling of administration retention | ||||
| holds does not prevent the enabling of event-based or | ||||
| non-event-based retention. | ||||
| </t> | ||||
| <t> | ||||
| If the principal attempting to change retention_hold does | ||||
| not have ACE4_WRITE_RETENTION_HOLD permissions, | ||||
| the attempt <bcp14>MUST</bcp14> fail with NFS4ERR_ACCESS. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="acl" numbered="true" toc="default"> | ||||
| <name>Access Control Attributes</name> | ||||
| <t> | ||||
| Access Control Lists (ACLs) are file attributes that specify | ||||
| fine-grained access control. This section covers the | ||||
| "acl", "dacl", "sacl", | ||||
| "aclsupport", "mode", and | ||||
| "mode_set_masked" file attributes and their | ||||
| interactions. Note that file attributes may apply to any file | ||||
| system object. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Goals</name> | ||||
| <t> | ||||
| ACLs and modes represent two well-established models for | ||||
| specifying permissions. This section specifies requirements | ||||
| that attempt to meet the following goals: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If a server supports the mode attribute, it should provide | ||||
| reasonable semantics to clients that only set and retrieve | ||||
| the mode attribute. | ||||
| </li> | ||||
| <li> | ||||
| If a server supports ACL attributes, it should provide | ||||
| reasonable semantics to clients that only set and retrieve | ||||
| those attributes. | ||||
| </li> | ||||
| <li> | ||||
| On servers that support the mode attribute, if ACL | ||||
| attributes have never been set on an object, via | ||||
| inheritance or explicitly, the behavior should be | ||||
| traditional UNIX-like behavior. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| On servers that support the mode attribute, if the ACL | ||||
| attributes have been previously set on an object, either | ||||
| explicitly or via inheritance: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Setting only the mode attribute should effectively | ||||
| control the traditional UNIX-like permissions of read, | ||||
| write, and execute on owner, owner_group, and other. | ||||
| </li> | ||||
| <li> | ||||
| Setting only the mode attribute should provide | ||||
| reasonable security. For example, setting a mode of | ||||
| 000 should be enough to ensure that future OPEN operations for | ||||
| OPEN4_SHARE_ACCESS_READ or OPEN4_SHARE_ACCESS_WRITE by any principal fail, regardless of a | ||||
| previously existing or inherited ACL. | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| NFSv4.1 may introduce different | ||||
| semantics relating to the mode and ACL attributes, | ||||
| but it does not render invalid any previously | ||||
| existing implementations. Additionally, this | ||||
| section provides clarifications based on previous | ||||
| implementations and discussions around them. | ||||
| </li> | ||||
| <li> | ||||
| On servers that support both the mode and the acl or | ||||
| dacl attributes, the server must keep the two consistent | ||||
| with each other. The value of the mode attribute (with | ||||
| the exception of the three high-order bits described in | ||||
| <xref target="attrdef_mode" format="default"/>) must be determined entirely | ||||
| by the value of the ACL, so that use of the mode is | ||||
| never required for anything other than setting the | ||||
| three high-order bits. See <xref target="setattr" format="default"/> | ||||
| for exact requirements. | ||||
| </li> | ||||
| <li> | ||||
| When a mode attribute is set on an object, the ACL | ||||
| attributes may need to be modified in order to not conflict | ||||
| with the new mode. In such cases, it is desirable that the | ||||
| ACL keep as much information as possible. This includes | ||||
| information about inheritance, AUDIT and ALARM ACEs, and | ||||
| permissions granted and denied that do not conflict with | ||||
| the new mode. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>File Attributes Discussion</name> | ||||
| <section anchor="attrdef_acl" numbered="true" toc="default"> | ||||
| <name>Attribute 12: acl</name> | ||||
| <t> | ||||
| The NFSv4.1 ACL attribute contains an array of Access | ||||
| Control Entries (ACEs) that are associated with the file | ||||
| system object. Although the client can set and | ||||
| get the acl attribute, the server is responsible for using | ||||
| the ACL to perform access control. The client can use the | ||||
| OPEN or ACCESS operations to check access without modifying | ||||
| or reading data or metadata. | ||||
| </t> | ||||
| <t> | ||||
| The NFS ACE structure is defined as follows: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| typedef uint32_t acetype4; | ||||
| typedef uint32_t aceflag4; | ||||
| typedef uint32_t acemask4; | ||||
| struct nfsace4 { | ||||
| acetype4 type; | ||||
| aceflag4 flag; | ||||
| acemask4 access_mask; | ||||
| utf8str_mixed who; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| To determine if a request succeeds, the server processes | ||||
| each nfsace4 entry in order. Only ACEs that have a "who" | ||||
| that matches the requester are considered. Each ACE is | ||||
| processed until all of the bits of the requester's access | ||||
| have been ALLOWED. Once a bit (see below) has been ALLOWED | ||||
| by an ACCESS_ALLOWED_ACE, it is no longer considered in the | ||||
| processing of later ACEs. If an ACCESS_DENIED_ACE is | ||||
| encountered where the requester's access still has unALLOWED | ||||
| bits in common with the "access_mask" of the ACE, the | ||||
| request is denied. When the ACL is fully processed, if | ||||
| there are bits in the requester's mask that have not been | ||||
| ALLOWED or DENIED, access is denied. | ||||
| </t> | ||||
| <t> | ||||
| Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE | ||||
| types do not affect a requester's access, and instead are | ||||
| for triggering events as a result of a requester's access | ||||
| attempt. Therefore, AUDIT and ALARM ACEs are processed only | ||||
| after processing ALLOW and DENY ACEs. | ||||
| </t> | ||||
| <t> | ||||
| The NFSv4.1 ACL model is quite rich. Some server | ||||
| platforms may provide access-control functionality that goes | ||||
| beyond the UNIX-style mode attribute, but that is not as | ||||
| rich as the NFS ACL model. So that users can take advantage | ||||
| of this more limited functionality, the server may support | ||||
| the acl attributes by mapping between its ACL model and the | ||||
| NFSv4.1 ACL model. Servers must ensure that the ACL | ||||
| they actually store or enforce is at least as strict as the | ||||
| NFSv4 ACL that was set. It is tempting to accomplish this | ||||
| by rejecting any ACL that falls outside the small set that | ||||
| can be represented accurately. However, such an approach | ||||
| can render ACLs unusable without special client-side | ||||
| knowledge of the server's mapping, which defeats the purpose | ||||
| of having a common NFSv4 ACL protocol. Therefore, servers | ||||
| should accept every ACL that they can without compromising | ||||
| security. To help accomplish this, servers may make a | ||||
| special exception, in the case of unsupported permission | ||||
| bits, to the rule that bits not ALLOWED or DENIED by an ACL | ||||
| must be denied. For example, a UNIX-style server might | ||||
| choose to silently allow read attribute permissions even | ||||
| though an ACL does not explicitly allow those permissions. | ||||
| (An ACL that explicitly denies permission to read attributes | ||||
| should still be rejected.) | ||||
| </t> | ||||
| <t> | ||||
| The situation is complicated by the fact that a server may | ||||
| have multiple modules that enforce ACLs. For example, the | ||||
| enforcement for NFSv4.1 access may be different from, | ||||
| but not weaker than, the enforcement for local access, and | ||||
| both may be different from the enforcement for access | ||||
| through other protocols such as SMB (Server Message Block). So it may be useful for | ||||
| a server to accept an ACL even if not all of its modules are | ||||
| able to support it. | ||||
| </t> | ||||
| <t> | ||||
| The guiding principle with regard to NFSv4 access is | ||||
| that the server must not accept ACLs that appear to | ||||
| make access to the file more restrictive than it really is. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>ACE Type</name> | ||||
| <t> | ||||
| The constants used for the type field (acetype4) are as | ||||
| follows: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000; | ||||
| const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001; | ||||
| const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002; | ||||
| const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| Only the ALLOWED and DENIED bits may be used in the | ||||
| dacl attribute, and only the AUDIT and ALARM bits may be | ||||
| used in the sacl attribute. All four are permitted in the | ||||
| acl attribute. | ||||
| </t> | ||||
| <table align="center"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Value</th> | ||||
| <th align="left">Abbreviation</th> | ||||
| <th align="left">Description</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">ACE4_ACCESS_ALLOWED_ACE_TYPE</td> | ||||
| <td align="left">ALLOW</td> | ||||
| <td align="left"> | ||||
| Explicitly grants the access defined in acemask4 to | ||||
| the file or directory. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">ACE4_ACCESS_DENIED_ACE_TYPE</td> | ||||
| <td align="left">DENY</td> | ||||
| <td align="left"> | ||||
| Explicitly denies the access defined in acemask4 to | ||||
| the file or directory. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">ACE4_SYSTEM_AUDIT_ACE_TYPE</td> | ||||
| <td align="left">AUDIT</td> | ||||
| <td align="left"> | ||||
| Log (in a system-dependent way) any access attempt to | ||||
| a file or directory that uses any of the access | ||||
| methods specified in acemask4. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">ACE4_SYSTEM_ALARM_ACE_TYPE</td> | ||||
| <td align="left">ALARM</td> | ||||
| <td align="left"> | ||||
| Generate an alarm (in a system-dependent way) when any | ||||
| access attempt is made to a file or directory for the | ||||
| access methods specified in acemask4. | ||||
| </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> | ||||
| The "Abbreviation" column denotes how the | ||||
| types will be referred to throughout the rest of this | ||||
| section. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="attrdef_aclsupport" numbered="true" toc="default"> | ||||
| <name>Attribute 13: aclsupport</name> | ||||
| <t> | ||||
| A server need not support all of the above ACE types. | ||||
| This attribute indicates which ACE types are supported for | ||||
| the current file system. The bitmask constants used to | ||||
| represent the above definitions within the aclsupport | ||||
| attribute are as follows: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const ACL4_SUPPORT_ALLOW_ACL = 0x00000001; | ||||
| const ACL4_SUPPORT_DENY_ACL = 0x00000002; | ||||
| const ACL4_SUPPORT_AUDIT_ACL = 0x00000004; | ||||
| const ACL4_SUPPORT_ALARM_ACL = 0x00000008; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| Servers that support either the ALLOW or DENY ACE type | ||||
| <bcp14>SHOULD</bcp14> support both ALLOW and DENY ACE types. | ||||
| </t> | ||||
| <t> | ||||
| Clients should not attempt to set an ACE unless the server | ||||
| claims support for that ACE type. If the server receives a | ||||
| request to set an ACE that it cannot store, it <bcp14>MUST</bcp14> reject | ||||
| the request with NFS4ERR_ATTRNOTSUPP. If the server | ||||
| receives a request to set an ACE that it can store but | ||||
| cannot enforce, the server <bcp14>SHOULD</bcp14> reject the request with | ||||
| NFS4ERR_ATTRNOTSUPP. | ||||
| </t> | ||||
| <t> | ||||
| Support for any of the ACL attributes is | ||||
| optional (albeit <bcp14>RECOMMENDED</bcp14>). | ||||
| However, a server that supports either of the new ACL | ||||
| attributes (dacl or sacl) <bcp14>MUST</bcp14> allow use of the new ACL | ||||
| attributes to access all of the ACE types that it | ||||
| supports. In other words, if such a server supports ALLOW | ||||
| or DENY ACEs, then it <bcp14>MUST</bcp14> support the dacl attribute, and | ||||
| if it supports AUDIT or ALARM ACEs, then it <bcp14>MUST</bcp14> support | ||||
| the sacl attribute. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="acemask" numbered="true" toc="default"> | ||||
| <name>ACE Access Mask</name> | ||||
| <t> | ||||
| The bitmask constants used for the access mask field | ||||
| are as follows: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const ACE4_READ_DATA = 0x00000001; | ||||
| const ACE4_LIST_DIRECTORY = 0x00000001; | ||||
| const ACE4_WRITE_DATA = 0x00000002; | ||||
| const ACE4_ADD_FILE = 0x00000002; | ||||
| const ACE4_APPEND_DATA = 0x00000004; | ||||
| const ACE4_ADD_SUBDIRECTORY = 0x00000004; | ||||
| const ACE4_READ_NAMED_ATTRS = 0x00000008; | ||||
| const ACE4_WRITE_NAMED_ATTRS = 0x00000010; | ||||
| const ACE4_EXECUTE = 0x00000020; | ||||
| const ACE4_DELETE_CHILD = 0x00000040; | ||||
| const ACE4_READ_ATTRIBUTES = 0x00000080; | ||||
| const ACE4_WRITE_ATTRIBUTES = 0x00000100; | ||||
| const ACE4_WRITE_RETENTION = 0x00000200; | ||||
| const ACE4_WRITE_RETENTION_HOLD = 0x00000400; | ||||
| const ACE4_DELETE = 0x00010000; | ||||
| const ACE4_READ_ACL = 0x00020000; | ||||
| const ACE4_WRITE_ACL = 0x00040000; | ||||
| const ACE4_WRITE_OWNER = 0x00080000; | ||||
| const ACE4_SYNCHRONIZE = 0x00100000; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| Note that some masks have coincident values, for | ||||
| example, ACE4_READ_DATA and ACE4_LIST_DIRECTORY. | ||||
| The mask entries ACE4_LIST_DIRECTORY, | ||||
| ACE4_ADD_FILE, and ACE4_ADD_SUBDIRECTORY are | ||||
| intended to be used with directory objects, | ||||
| while ACE4_READ_DATA, ACE4_WRITE_DATA, and | ||||
| ACE4_APPEND_DATA are intended to be used with | ||||
| non-directory objects. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Discussion of Mask Attributes</name> | ||||
| <t>ACE4_READ_DATA</t> | ||||
| <ul empty="true"><li> <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd><t>READ</t> | ||||
| <t>OPEN</t></dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| <t> | ||||
| Permission to read the data of the file. | ||||
| </t> | ||||
| <t> | ||||
| Servers <bcp14>SHOULD</bcp14> allow a user the ability to read the data | ||||
| of the file when only the ACE4_EXECUTE access mask bit is | ||||
| allowed. | ||||
| </t> | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_LIST_DIRECTORY</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd>READDIR</dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to list the contents of a directory. | ||||
| </dd> | ||||
| </dl> | ||||
| </li> | ||||
| </ul> | ||||
| <t>ACE4_WRITE_DATA</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd><t>WRITE</t> | ||||
| <t>OPEN</t> | ||||
| <t>SETATTR of size</t> | ||||
| </dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to modify a file's data. | ||||
| </dd> | ||||
| </dl> | ||||
| </li></ul> | ||||
| <t>ACE4_ADD_FILE</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd><t>CREATE</t> | ||||
| <t>LINK</t> | ||||
| <t>OPEN</t> | ||||
| <t>RENAME</t> | ||||
| </dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to add a new file in a directory. | ||||
| The CREATE operation is affected when nfs_ftype4 | ||||
| is NF4LNK, NF4BLK, NF4CHR, NF4SOCK, or | ||||
| NF4FIFO. (NF4DIR is not listed because it is | ||||
| covered by ACE4_ADD_SUBDIRECTORY.) OPEN is | ||||
| affected when used to create a regular file. | ||||
| LINK and RENAME are always affected. | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_APPEND_DATA</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd><t>WRITE</t> | ||||
| <t>OPEN</t> | ||||
| <t>SETATTR of size</t> | ||||
| </dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| The ability to modify a file's data, but only | ||||
| starting at EOF. This allows for the notion of | ||||
| append-only files, by allowing ACE4_APPEND_DATA | ||||
| and denying ACE4_WRITE_DATA to the same user or | ||||
| group. If a file has an ACL such as the one | ||||
| described above and a WRITE request is made for | ||||
| somewhere other than EOF, the server <bcp14>SHOULD</bcp14> | ||||
| return NFS4ERR_ACCESS. | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_ADD_SUBDIRECTORY</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd><t>CREATE</t> | ||||
| <t>RENAME</t></dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to create a subdirectory in a | ||||
| directory. The CREATE operation is affected | ||||
| when nfs_ftype4 is NF4DIR. The RENAME operation | ||||
| is always affected. | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_READ_NAMED_ATTRS</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd><t>OPENATTR</t></dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to read the named attributes of a | ||||
| file or to look up the named attribute | ||||
| directory. OPENATTR is affected when it is not | ||||
| used to create a named attribute directory. | ||||
| This is when 1) createdir is TRUE, but a named | ||||
| attribute directory already exists, or 2) | ||||
| createdir is FALSE. | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_WRITE_NAMED_ATTRS</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd><t>OPENATTR</t> | ||||
| </dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to write the named attributes of a | ||||
| file or to create a named attribute directory. | ||||
| OPENATTR is affected when it is used to create a | ||||
| named attribute directory. This is when | ||||
| createdir is TRUE and no named attribute | ||||
| directory exists. The ability to check whether | ||||
| or not a named attribute directory exists | ||||
| depends on the ability to look it up; therefore, | ||||
| users also need the ACE4_READ_NAMED_ATTRS | ||||
| permission in order to create a named attribute | ||||
| directory. | ||||
| </dd> | ||||
| </dl> | ||||
| </li> | ||||
| </ul> | ||||
| <t>ACE4_EXECUTE</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd><t>READ</t> | ||||
| <t>OPEN</t> | ||||
| <t>REMOVE</t> | ||||
| <t>RENAME</t> | ||||
| <t>LINK</t> | ||||
| <t>CREATE</t> | ||||
| </dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| <t> | ||||
| Permission to execute a file. | ||||
| </t> | ||||
| <t> | ||||
| Servers <bcp14>SHOULD</bcp14> allow a | ||||
| user the ability to read the data of the file | ||||
| when only the ACE4_EXECUTE access mask bit is | ||||
| allowed. This is because there is no way to | ||||
| execute a file without reading the contents. | ||||
| Though a server may treat ACE4_EXECUTE and | ||||
| ACE4_READ_DATA bits identically when deciding to | ||||
| permit a READ operation, it <bcp14>SHOULD</bcp14> still allow | ||||
| the two bits to be set independently in ACLs, | ||||
| and <bcp14>MUST</bcp14> distinguish between them when replying | ||||
| to ACCESS operations. In particular, servers | ||||
| <bcp14>SHOULD NOT</bcp14> silently turn on one of the two bits | ||||
| when the other is set, as that would make it | ||||
| impossible for the client to correctly enforce | ||||
| the distinction between read and execute | ||||
| permissions. | ||||
| </t> | ||||
| <t>As an example, following a SETATTR of the following ACL:</t> | ||||
| <ul empty="true"> | ||||
| <li>nfsuser:ACE4_EXECUTE:ALLOW</li> | ||||
| </ul> | ||||
| <t> | ||||
| A subsequent GETATTR of ACL for that file <bcp14>SHOULD</bcp14> return: | ||||
| </t> | ||||
| <ul empty="true"> | ||||
| <li>nfsuser:ACE4_EXECUTE:ALLOW</li> | ||||
| </ul> | ||||
| <t> | ||||
| Rather than: | ||||
| </t> | ||||
| <ul empty="true"> | ||||
| <li> | ||||
| nfsuser:ACE4_EXECUTE/ACE4_READ_DATA:ALLOW | ||||
| </li></ul> | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_EXECUTE</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd>LOOKUP</dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to traverse/search a directory. | ||||
| </dd> | ||||
| </dl> | ||||
| </li></ul> | ||||
| <t>ACE4_DELETE_CHILD</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd><t>REMOVE</t> | ||||
| <t>RENAME</t></dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to delete a file or directory within | ||||
| a directory. | ||||
| See <xref target="delete-delete_child" format="default"/> | ||||
| for information on ACE4_DELETE and | ||||
| ACE4_DELETE_CHILD interact. | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_READ_ATTRIBUTES</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd><t>GETATTR of file system object attributes</t> | ||||
| <t>VERIFY</t> | ||||
| <t>NVERIFY</t> | ||||
| <t>READDIR</t></dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| The ability to read basic attributes (non-ACLs) | ||||
| of a file. On a UNIX system, basic attributes | ||||
| can be thought of as the stat-level attributes. | ||||
| Allowing this access mask bit would mean that the | ||||
| entity can execute "ls -l" and stat. If a | ||||
| READDIR operation requests attributes, this mask | ||||
| must be allowed for the READDIR to succeed. | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_WRITE_ATTRIBUTES</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd><t>SETATTR of time_access_set, time_backup,</t> | ||||
| <t>time_create, time_modify_set, mimetype, hidden, system</t></dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to change the times associated with a | ||||
| file or directory to an arbitrary value. Also | ||||
| permission to change the mimetype, hidden, and | ||||
| system attributes. A user having | ||||
| ACE4_WRITE_DATA or ACE4_WRITE_ATTRIBUTES will be | ||||
| allowed to set the times associated with a file | ||||
| to the current server time. | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_WRITE_RETENTION</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd>SETATTR of retention_set, retentevt_set.</dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to modify the durations of event and | ||||
| non-event-based retention. Also permission to | ||||
| enable event and non-event-based retention. A | ||||
| server <bcp14>MAY</bcp14> behave such that setting | ||||
| ACE4_WRITE_ATTRIBUTES allows | ||||
| ACE4_WRITE_RETENTION. | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_WRITE_RETENTION_HOLD</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd>SETATTR of retention_hold.</dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to modify the administration | ||||
| retention holds. A server <bcp14>MAY</bcp14> map | ||||
| ACE4_WRITE_ATTRIBUTES to | ||||
| ACE_WRITE_RETENTION_HOLD. | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_DELETE</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd>REMOVE</dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to delete the | ||||
| file or directory. | ||||
| See <xref target="delete-delete_child" format="default"/> | ||||
| for information on ACE4_DELETE and | ||||
| ACE4_DELETE_CHILD interact. | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_READ_ACL</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd><t>GETATTR of acl, dacl, or sacl</t> | ||||
| <t>NVERIFY</t> | ||||
| <t>VERIFY</t></dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to read the ACL. | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_WRITE_ACL</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd>SETATTR of acl and mode</dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd>Permission to write the acl and mode attributes.</dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_WRITE_OWNER</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd>SETATTR of owner and owner_group</dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| Permission to write the owner and owner_group | ||||
| attributes. On UNIX systems, this is the | ||||
| ability to execute chown() and chgrp(). | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t>ACE4_SYNCHRONIZE</t> | ||||
| <ul empty="true"><li> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>Operation(s) affected:</dt> | ||||
| <dd>NONE</dd> | ||||
| <dt>Discussion:</dt> | ||||
| <dd> | ||||
| <t> | ||||
| Permission to use the file object as a | ||||
| synchronization primitive for interprocess | ||||
| communication. This permission is not enforced | ||||
| or interpreted by the NFSv4.1 server on behalf of | ||||
| the client. | ||||
| </t> | ||||
| <t> | ||||
| Typically, the ACE4_SYNCHRONIZE permission is | ||||
| only meaningful on local file systems, i.e., | ||||
| file systems not accessed via NFSv4.1. The reason | ||||
| that the permission bit exists is that some operating | ||||
| environments, such as Windows, use ACE4_SYNCHRONIZE. | ||||
| </t> | ||||
| <t> | ||||
| For example, if a client copies a file that has | ||||
| ACE4_SYNCHRONIZE set from a local file system to | ||||
| an NFSv4.1 server, and then later copies the file | ||||
| from the NFSv4.1 server to a local file system, | ||||
| it is likely that if ACE4_SYNCHRONIZE was set | ||||
| in the original file, the client will want it | ||||
| set in the second copy. The first copy will not | ||||
| have the permission set unless the NFSv4.1 server | ||||
| has the means to set the ACE4_SYNCHRONIZE bit. The | ||||
| second copy will not have the permission set unless | ||||
| the NFSv4.1 server has the means to retrieve the | ||||
| ACE4_SYNCHRONIZE bit. | ||||
| </t> | ||||
| </dd> | ||||
| </dl></li> | ||||
| </ul> | ||||
| <t> | ||||
| Server implementations need not provide the granularity | ||||
| of control that is implied by this list of masks. For | ||||
| example, POSIX-based systems might not distinguish | ||||
| ACE4_APPEND_DATA (the ability to append to a file) from | ||||
| ACE4_WRITE_DATA (the ability to modify existing | ||||
| contents); both masks would be tied to a single "write" | ||||
| permission <xref target="chmod" format="default"/>. When such a server returns attributes to the | ||||
| client, it would show both ACE4_APPEND_DATA and | ||||
| ACE4_WRITE_DATA if and only if the write permission is | ||||
| enabled. | ||||
| </t> | ||||
| <t> | ||||
| If a server receives a SETATTR request that it cannot | ||||
| accurately implement, it should err in the direction of | ||||
| more restricted access, except in the previously | ||||
| discussed cases of execute and read. For example, | ||||
| suppose a server cannot distinguish overwriting data | ||||
| from appending new data, as described in the previous | ||||
| paragraph. If a client submits an ALLOW ACE where | ||||
| ACE4_APPEND_DATA is set but ACE4_WRITE_DATA is not (or | ||||
| vice versa), the server should either turn off | ||||
| ACE4_APPEND_DATA or reject the request with | ||||
| NFS4ERR_ATTRNOTSUPP. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="delete-delete_child" numbered="true" toc="default"> | ||||
| <name>ACE4_DELETE vs. ACE4_DELETE_CHILD</name> | ||||
| <t> | ||||
| Two access mask bits govern the ability to delete a | ||||
| directory entry: ACE4_DELETE on the object | ||||
| itself (the "target") and ACE4_DELETE_CHILD on | ||||
| the containing directory (the "parent"). | ||||
| </t> | ||||
| <t> | ||||
| Many systems also take the "sticky bit" (MODE4_SVTX) | ||||
| on a directory to allow unlink only to a user that | ||||
| owns either the target or the parent; on some | ||||
| such systems the decision also depends on | ||||
| whether the target is writable. | ||||
| </t> | ||||
| <t> | ||||
| Servers <bcp14>SHOULD</bcp14> allow unlink if either ACE4_DELETE | ||||
| is permitted on the target, or ACE4_DELETE_CHILD is | ||||
| permitted on the parent. (Note that this is | ||||
| true even if the parent or target explicitly | ||||
| denies one of these permissions.) | ||||
| </t> | ||||
| <t> | ||||
| If the ACLs in question neither explicitly ALLOW | ||||
| nor DENY either of the above, and if MODE4_SVTX is | ||||
| not set on the parent, then the server <bcp14>SHOULD</bcp14> allow | ||||
| the removal if and only if ACE4_ADD_FILE is permitted. | ||||
| In the case where MODE4_SVTX is set, the server | ||||
| may also require the remover to own either the parent | ||||
| or the target, or may require the target to be | ||||
| writable. | ||||
| </t> | ||||
| <t> | ||||
| This allows servers to support something close to | ||||
| traditional UNIX-like semantics, with ACE4_ADD_FILE | ||||
| taking the place of the write bit. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="aceflag" numbered="true" toc="default"> | ||||
| <name>ACE flag</name> | ||||
| <t> | ||||
| The bitmask constants used for the flag field are as | ||||
| follows: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const ACE4_FILE_INHERIT_ACE = 0x00000001; | ||||
| const ACE4_DIRECTORY_INHERIT_ACE = 0x00000002; | ||||
| const ACE4_NO_PROPAGATE_INHERIT_ACE = 0x00000004; | ||||
| const ACE4_INHERIT_ONLY_ACE = 0x00000008; | ||||
| const ACE4_SUCCESSFUL_ACCESS_ACE_FLAG = 0x00000010; | ||||
| const ACE4_FAILED_ACCESS_ACE_FLAG = 0x00000020; | ||||
| const ACE4_IDENTIFIER_GROUP = 0x00000040; | ||||
| const ACE4_INHERITED_ACE = 0x00000080; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| A server need not support any of these flags. If the | ||||
| server supports flags that are similar to, but not | ||||
| exactly the same as, these flags, the implementation | ||||
| may define a mapping between the protocol-defined | ||||
| flags and the implementation-defined flags. | ||||
| </t> | ||||
| <t> | ||||
| For example, suppose a client tries to set an ACE with | ||||
| ACE4_FILE_INHERIT_ACE set but not | ||||
| ACE4_DIRECTORY_INHERIT_ACE. If the server does not | ||||
| support any form of ACL inheritance, the server should | ||||
| reject the request with NFS4ERR_ATTRNOTSUPP. If the | ||||
| server supports a single "inherit ACE" flag that | ||||
| applies to both files and directories, the server may | ||||
| reject the request (i.e., requiring the client to set | ||||
| both the file and directory inheritance flags). The | ||||
| server may also accept the request and silently turn | ||||
| on the ACE4_DIRECTORY_INHERIT_ACE flag. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Discussion of Flag Bits</name> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>ACE4_FILE_INHERIT_ACE</dt> | ||||
| <dd> | ||||
| Any non-directory file in any | ||||
| sub-directory will get this ACE | ||||
| inherited. | ||||
| </dd> | ||||
| <dt>ACE4_DIRECTORY_INHERIT_ACE</dt> | ||||
| <dd> | ||||
| <t> | ||||
| Can be placed on a directory and indicates | ||||
| that this ACE should be added to each new | ||||
| directory created. | ||||
| </t> | ||||
| <t> | ||||
| If this flag is set in an ACE in an ACL | ||||
| attribute to be set on a non-directory | ||||
| file system object, the operation | ||||
| attempting to set the ACL <bcp14>SHOULD</bcp14> fail | ||||
| with NFS4ERR_ATTRNOTSUPP. | ||||
| </t> | ||||
| </dd> | ||||
| <dt>ACE4_NO_PROPAGATE_INHERIT_ACE</dt> | ||||
| <dd> | ||||
| Can be placed on a directory. This flag | ||||
| tells the server that inheritance of this | ||||
| ACE should stop at newly created child | ||||
| directories. | ||||
| </dd> | ||||
| <dt>ACE4_INHERIT_ONLY_ACE</dt> | ||||
| <dd> | ||||
| <t> | ||||
| Can be placed on a directory but does not | ||||
| apply to the directory; ALLOW and DENY ACEs | ||||
| with this bit set do not affect access to | ||||
| the directory, and AUDIT and ALARM ACEs | ||||
| with this bit set do not trigger log or | ||||
| alarm events. Such ACEs only take effect | ||||
| once they are applied (with this bit | ||||
| cleared) to newly created files and | ||||
| directories as specified by the | ||||
| ACE4_FILE_INHERIT_ACE and ACE4_DIRECTORY_INHERIT_ACE | ||||
| flags. | ||||
| </t> | ||||
| <t> | ||||
| If this flag is present on an ACE, but | ||||
| neither ACE4_DIRECTORY_INHERIT_ACE nor | ||||
| ACE4_FILE_INHERIT_ACE is present, then | ||||
| an operation attempting to set such an | ||||
| attribute <bcp14>SHOULD</bcp14> fail with | ||||
| NFS4ERR_ATTRNOTSUPP. | ||||
| </t> | ||||
| </dd> | ||||
| <dt>ACE4_SUCCESSFUL_ACCESS_ACE_FLAG</dt> | ||||
| <dd/> | ||||
| <dt>ACE4_FAILED_ACCESS_ACE_FLAG</dt> | ||||
| <dd> | ||||
| The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG | ||||
| (SUCCESS) and ACE4_FAILED_ACCESS_ACE_FLAG | ||||
| (FAILED) flag bits may be set only on | ||||
| ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and | ||||
| ACE4_SYSTEM_ALARM_ACE_TYPE (ALARM) ACE | ||||
| types. If during the processing of the | ||||
| file's ACL, the server encounters an AUDIT | ||||
| or ALARM ACE that matches the principal | ||||
| attempting the OPEN, the server notes that | ||||
| fact, and the presence, if any, of the | ||||
| SUCCESS and FAILED flags encountered in | ||||
| the AUDIT or ALARM ACE. Once the server | ||||
| completes the ACL processing, it then | ||||
| notes if the operation succeeded or | ||||
| failed. If the operation succeeded, and if | ||||
| the SUCCESS flag was set for a matching | ||||
| AUDIT or ALARM ACE, then the appropriate | ||||
| AUDIT or ALARM event occurs. If the | ||||
| operation failed, and if the FAILED flag | ||||
| was set for the matching AUDIT or ALARM | ||||
| ACE, then the appropriate AUDIT or ALARM | ||||
| event occurs. Either or both of the | ||||
| SUCCESS or FAILED can be set, but if | ||||
| neither is set, the AUDIT or ALARM ACE is | ||||
| not useful. | ||||
| </dd> | ||||
| <dt/> | ||||
| <dd> | ||||
| The previously described processing | ||||
| applies to ACCESS operations even when | ||||
| they return NFS4_OK. For the purposes of | ||||
| AUDIT and ALARM, we consider an ACCESS | ||||
| operation to be a "failure" if it fails | ||||
| to return a bit that was requested and | ||||
| supported. | ||||
| </dd> | ||||
| <dt>ACE4_IDENTIFIER_GROUP</dt> | ||||
| <dd> | ||||
| Indicates that the "who" refers to a GROUP | ||||
| as defined under UNIX or a GROUP ACCOUNT | ||||
| as defined under Windows. Clients and | ||||
| servers <bcp14>MUST</bcp14> ignore the | ||||
| ACE4_IDENTIFIER_GROUP flag on ACEs with a | ||||
| who value equal to one of the special | ||||
| identifiers outlined in | ||||
| <xref target="acewho" format="default"/>. | ||||
| </dd> | ||||
| <dt>ACE4_INHERITED_ACE</dt> | ||||
| <dd> | ||||
| Indicates that this ACE is inherited from | ||||
| a parent directory. A server that supports | ||||
| automatic inheritance will place | ||||
| this flag on any ACEs inherited from the | ||||
| parent directory when creating a new | ||||
| object. Client applications will use this | ||||
| to perform automatic inheritance. | ||||
| Clients and servers <bcp14>MUST</bcp14> clear this | ||||
| bit in the acl attribute; it may only | ||||
| be used in the dacl and sacl attributes. | ||||
| </dd> | ||||
| </dl> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="acewho" numbered="true" toc="default"> | ||||
| <name>ACE Who</name> | ||||
| <t> | ||||
| The "who" field of an ACE is an identifier that | ||||
| specifies the principal or principals to whom the ACE | ||||
| applies. It may refer to a user or a group, with the flag | ||||
| bit ACE4_IDENTIFIER_GROUP specifying which. | ||||
| </t> | ||||
| <t> | ||||
| There are several special identifiers that need to be | ||||
| understood universally, rather than in the context of a | ||||
| particular DNS domain. Some of these identifiers cannot be | ||||
| understood when an NFS client accesses the server, but | ||||
| have meaning when a local process accesses the file. The | ||||
| ability to display and modify these permissions is | ||||
| permitted over NFS, even if none of the access methods on | ||||
| the server understands the identifiers. | ||||
| </t> | ||||
| <table anchor="specialwho" align="center"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Who</th> | ||||
| <th align="left">Description</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">OWNER</td> | ||||
| <td align="left"> | ||||
| The owner of the file. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">GROUP</td> | ||||
| <td align="left"> | ||||
| The group associated with the file. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">EVERYONE</td> | ||||
| <td align="left"> | ||||
| The world, including the owner and owning group. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">INTERACTIVE</td> | ||||
| <td align="left"> | ||||
| Accessed from an interactive terminal. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NETWORK</td> | ||||
| <td align="left"> | ||||
| Accessed via the network. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">DIALUP</td> | ||||
| <td align="left"> | ||||
| Accessed as a dialup user to the server. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">BATCH</td> | ||||
| <td align="left"> | ||||
| Accessed from a batch job. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">ANONYMOUS</td> | ||||
| <td align="left"> | ||||
| Accessed without any authentication. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">AUTHENTICATED</td> | ||||
| <td align="left"> | ||||
| Any authenticated user (opposite of | ||||
| ANONYMOUS). | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">SERVICE</td> | ||||
| <td align="left"> | ||||
| Access from a system service. | ||||
| </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> | ||||
| To avoid conflict, these special identifiers are | ||||
| distinguished by an appended "@" and should appear in the | ||||
| form "xxxx@" (with no domain name after the "@"), for | ||||
| example, ANONYMOUS@. | ||||
| </t> | ||||
| <t> | ||||
| The ACE4_IDENTIFIER_GROUP flag <bcp14>MUST</bcp14> be ignored on | ||||
| entries with these special identifiers. When encoding | ||||
| entries with these special identifiers, the | ||||
| ACE4_IDENTIFIER_GROUP flag <bcp14>SHOULD</bcp14> be set to zero. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Discussion of EVERYONE@</name> | ||||
| <t> | ||||
| It is important to note that "EVERYONE@" is not | ||||
| equivalent to the UNIX "other" entity. This is | ||||
| because, by definition, UNIX "other" does not include | ||||
| the owner or owning group of a file. "EVERYONE@" means | ||||
| literally everyone, including the owner or owning | ||||
| group. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="attrdef_dacl" numbered="true" toc="default"> | ||||
| <name>Attribute 58: dacl</name> | ||||
| <t> | ||||
| The dacl attribute is like the acl attribute, | ||||
| but dacl allows | ||||
| just ALLOW and DENY ACEs. The dacl | ||||
| attribute supports automatic inheritance (see | ||||
| <xref target="auto_inherit" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="attrdef_sacl" numbered="true" toc="default"> | ||||
| <name>Attribute 59: sacl</name> | ||||
| <t> | ||||
| The sacl attribute is like the acl attribute, | ||||
| but sacl allows | ||||
| just AUDIT and ALARM ACEs. The sacl | ||||
| attribute supports automatic inheritance (see | ||||
| <xref target="auto_inherit" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="attrdef_mode" numbered="true" toc="default"> | ||||
| <name>Attribute 33: mode</name> | ||||
| <t> | ||||
| The NFSv4.1 mode attribute is based on the UNIX mode | ||||
| bits. The following bits are defined: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const MODE4_SUID = 0x800; /* set user id on execution */ | ||||
| const MODE4_SGID = 0x400; /* set group id on execution */ | ||||
| const MODE4_SVTX = 0x200; /* save text even after use */ | ||||
| const MODE4_RUSR = 0x100; /* read permission: owner */ | ||||
| const MODE4_WUSR = 0x080; /* write permission: owner */ | ||||
| const MODE4_XUSR = 0x040; /* execute permission: owner */ | ||||
| const MODE4_RGRP = 0x020; /* read permission: group */ | ||||
| const MODE4_WGRP = 0x010; /* write permission: group */ | ||||
| const MODE4_XGRP = 0x008; /* execute permission: group */ | ||||
| const MODE4_ROTH = 0x004; /* read permission: other */ | ||||
| const MODE4_WOTH = 0x002; /* write permission: other */ | ||||
| const MODE4_XOTH = 0x001; /* execute permission: other */ | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| Bits MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR apply to the | ||||
| principal identified in the owner attribute. Bits MODE4_RGRP, | ||||
| MODE4_WGRP, and MODE4_XGRP apply to principals identified in | ||||
| the owner_group attribute but who are not identified in the | ||||
| owner attribute. Bits MODE4_ROTH, MODE4_WOTH, and MODE4_XOTH apply | ||||
| to any principal that does not match that in the owner | ||||
| attribute and does not have a group matching that of the | ||||
| owner_group attribute. | ||||
| </t> | ||||
| <t> | ||||
| Bits within a mode other than those specified above | ||||
| are not defined by this protocol. A server | ||||
| <bcp14>MUST NOT</bcp14> return bits other than those defined above in a | ||||
| GETATTR or READDIR operation, and it <bcp14>MUST</bcp14> return NFS4ERR_INVAL | ||||
| if bits other than those defined above are set in a SETATTR, | ||||
| CREATE, OPEN, VERIFY, or NVERIFY operation. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="attrdef_mode_set_masked" numbered="true" toc="default"> | ||||
| <name>Attribute 74: mode_set_masked</name> | ||||
| <t> | ||||
| The mode_set_masked attribute is a write-only attribute | ||||
| that allows individual bits in the mode attribute to be | ||||
| set or reset, without changing others. It allows, for | ||||
| example, the bits MODE4_SUID, MODE4_SGID, and MODE4_SVTX | ||||
| to be modified while leaving unmodified any of the | ||||
| nine low-order mode bits devoted to permissions. | ||||
| </t> | ||||
| <t> | ||||
| In such instances that the nine low-order bits are left | ||||
| unmodified, then neither the acl nor the dacl attribute | ||||
| should be automatically modified as discussed in | ||||
| <xref target="setattr" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| The mode_set_masked attribute consists of two words, | ||||
| each in the form of a mode4. The first consists of the | ||||
| value to be applied to the current mode value and the | ||||
| second is a mask. Only bits set to one in the mask word | ||||
| are changed (set or reset) in the file's mode. All | ||||
| other bits in the mode remain unchanged. Bits in the | ||||
| first word that correspond to bits that are zero in | ||||
| the mask are ignored, except that undefined bits are | ||||
| checked for validity and can result in NFS4ERR_INVAL as | ||||
| described below. | ||||
| </t> | ||||
| <t> | ||||
| The mode_set_masked attribute is only valid in a SETATTR | ||||
| operation. If it is used in a CREATE or OPEN operation, the | ||||
| server <bcp14>MUST</bcp14> return NFS4ERR_INVAL. | ||||
| </t> | ||||
| <t> | ||||
| Bits not defined as valid in the mode attribute are not | ||||
| valid in either word of the mode_set_masked attribute. | ||||
| The server <bcp14>MUST</bcp14> return NFS4ERR_INVAL | ||||
| if any such bits are set to one in a SETATTR. | ||||
| If the mode and | ||||
| mode_set_masked attributes are both specified in the | ||||
| same SETATTR, the server <bcp14>MUST</bcp14> also return NFS4ERR_INVAL. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Common Methods</name> | ||||
| <t> | ||||
| The requirements in this section will be referred to in future | ||||
| sections, especially <xref target="aclreqs" format="default"/>. | ||||
| </t> | ||||
| <section anchor="useacl" numbered="true" toc="default"> | ||||
| <name>Interpreting an ACL</name> | ||||
| <section anchor="serverinterp" numbered="true" toc="default"> | ||||
| <name>Server Considerations</name> | ||||
| <t> | ||||
| The server uses the algorithm described in | ||||
| <xref target="attrdef_acl" format="default"/> to determine whether an ACL | ||||
| allows access to an object. However, the ACL might not be | ||||
| the sole determiner of access. For example: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| In the case of a file system exported as read-only, | ||||
| the server may deny write access even though | ||||
| an object's ACL grants it. | ||||
| </li> | ||||
| <li> | ||||
| Server implementations <bcp14>MAY</bcp14> grant ACE4_WRITE_ACL | ||||
| and ACE4_READ_ACL permissions to prevent | ||||
| a situation from arising in which there is no valid | ||||
| way to ever modify the ACL. | ||||
| </li> | ||||
| <li> | ||||
| All servers will allow a user the ability to read | ||||
| the data of the file when only the execute | ||||
| permission is granted (i.e., if the ACL denies the | ||||
| user the ACE4_READ_DATA access and allows the user | ||||
| ACE4_EXECUTE, the server will allow the user to | ||||
| read the data of the file). | ||||
| </li> | ||||
| <li> | ||||
| Many servers have the notion of owner-override in | ||||
| which the owner of the object is allowed to | ||||
| override accesses that are denied by the ACL. | ||||
| This may be helpful, for example, to allow users | ||||
| continued access to open files on which the | ||||
| permissions have changed. | ||||
| </li> | ||||
| <li> | ||||
| Many servers have the notion of a | ||||
| "superuser" that has privileges beyond | ||||
| an ordinary user. The superuser may be able | ||||
| to read or write data or metadata in ways that would | ||||
| not be permitted by the ACL. | ||||
| </li> | ||||
| <li> | ||||
| A retention attribute might also block access otherwise | ||||
| allowed by ACLs (see <xref target="retention" format="default"/>). | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="clientinterp" numbered="true" toc="default"> | ||||
| <name>Client Considerations</name> | ||||
| <t> | ||||
| Clients <bcp14>SHOULD NOT</bcp14> do their own access checks based on | ||||
| their interpretation of the ACL, but rather use the OPEN and | ||||
| ACCESS operations to do access checks. This allows the | ||||
| client to act on the results of having the server | ||||
| determine whether or not access should be granted based on | ||||
| its interpretation of the ACL. | ||||
| </t> | ||||
| <t> | ||||
| Clients must be aware of situations in which an object's | ||||
| ACL will define a certain access even though the server | ||||
| will not enforce it. In general, but especially in these | ||||
| situations, the client needs to do its part in the | ||||
| enforcement of access as defined by the ACL. To do this, | ||||
| the client <bcp14>MAY</bcp14> send the appropriate ACCESS operation | ||||
| prior to servicing the request of the user or application | ||||
| in order to determine whether the user or application | ||||
| should be granted the access requested. For examples in | ||||
| which the ACL may define accesses that the server doesn't | ||||
| enforce, see <xref target="serverinterp" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="computemode" numbered="true" toc="default"> | ||||
| <name>Computing a Mode Attribute from an ACL</name> | ||||
| <t> | ||||
| The following method can be used to calculate the MODE4_R*, | ||||
| MODE4_W*, and MODE4_X* bits of a mode attribute, based upon | ||||
| an ACL. | ||||
| </t> | ||||
| <t> | ||||
| First, for each of the special identifiers OWNER@, GROUP@, and | ||||
| EVERYONE@, evaluate the ACL in order, considering only ALLOW | ||||
| and DENY ACEs for the identifier EVERYONE@ and for the | ||||
| identifier under consideration. The result of the evaluation | ||||
| will be an NFSv4 ACL mask showing exactly which bits are | ||||
| permitted to that identifier. | ||||
| </t> | ||||
| <t> | ||||
| Then translate the calculated mask for OWNER@, GROUP@, and | ||||
| EVERYONE@ into mode bits for, respectively, the user, group, | ||||
| and other, as follows: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| Set the read bit (MODE4_RUSR, MODE4_RGRP, or | ||||
| MODE4_ROTH) if and only if ACE4_READ_DATA is set in | ||||
| the corresponding mask. | ||||
| </li> | ||||
| <li> | ||||
| Set the write bit (MODE4_WUSR, MODE4_WGRP, or | ||||
| MODE4_WOTH) if and only if ACE4_WRITE_DATA and | ||||
| ACE4_APPEND_DATA are both set in the corresponding | ||||
| mask. | ||||
| </li> | ||||
| <li> | ||||
| Set the execute bit (MODE4_XUSR, MODE4_XGRP, or | ||||
| MODE4_XOTH), if and only if ACE4_EXECUTE is set in the | ||||
| corresponding mask. | ||||
| </li> | ||||
| </ol> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Discussion</name> | ||||
| <t> | ||||
| Some server implementations also add bits permitted to | ||||
| named users and groups to the group bits (MODE4_RGRP, | ||||
| MODE4_WGRP, and MODE4_XGRP). | ||||
| </t> | ||||
| <t> | ||||
| Implementations are discouraged from doing this, because | ||||
| it has been found to cause confusion for users who see | ||||
| members of a file's group denied access that the mode | ||||
| bits appear to allow. (The presence of DENY ACEs may also | ||||
| lead to such behavior, but DENY ACEs are expected to be | ||||
| more rarely used.) | ||||
| </t> | ||||
| <t> | ||||
| The same user confusion seen when fetching the mode also | ||||
| results if setting the mode does not effectively control | ||||
| permissions for the owner, group, and other users; this | ||||
| motivates some of the requirements that follow. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="aclreqs" numbered="true" toc="default"> | ||||
| <name>Requirements</name> | ||||
| <t> | ||||
| The server that supports both mode and ACL must take care to | ||||
| synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with | ||||
| the ACEs that have respective who fields of "OWNER@", "GROUP@", | ||||
| and "EVERYONE@". This way, the client can see if semantically equivalent | ||||
| access permissions exist whether the client asks for the owner, | ||||
| owner_group, and mode attributes or for just the ACL. | ||||
| </t> | ||||
| <t> | ||||
| In this section, much is made of the methods in <xref target="computemode" format="default"/>. Many requirements refer to this section. | ||||
| But note that the methods have behaviors specified with | ||||
| "<bcp14>SHOULD</bcp14>". This is intentional, to avoid invalidating | ||||
| existing implementations that compute the mode according to the | ||||
| withdrawn POSIX ACL draft (1003.1e draft 17), rather than by | ||||
| actual permissions on owner, group, and other. | ||||
| </t> | ||||
| <section anchor="setattr" numbered="true" toc="default"> | ||||
| <name>Setting the Mode and/or ACL Attributes</name> | ||||
| <t> | ||||
| In the case where a server supports the sacl or | ||||
| dacl attribute, in addition to the acl attribute, | ||||
| the server <bcp14>MUST</bcp14> fail a request to set the acl | ||||
| attribute simultaneously with a dacl or sacl | ||||
| attribute. The error to be given is NFS4ERR_ATTRNOTSUPP. | ||||
| </t> | ||||
| <section anchor="setmode" numbered="true" toc="default"> | ||||
| <name>Setting Mode and not ACL</name> | ||||
| <t> | ||||
| When any of the nine low-order mode bits | ||||
| are subject to change, either because the mode | ||||
| attribute was set or because the mode_set_masked | ||||
| attribute was set and the mask included one or more | ||||
| bits from the nine low-order mode bits, | ||||
| and no ACL attribute is explicitly | ||||
| set, the acl and dacl attributes must be modified | ||||
| in accordance with the updated value of those bits. | ||||
| This must happen | ||||
| even if the value of the low-order bits | ||||
| is the same after the mode is set as before. | ||||
| </t> | ||||
| <t> | ||||
| Note that any AUDIT or ALARM ACEs (hence any ACEs in the | ||||
| sacl attribute) are unaffected by changes to the mode. | ||||
| </t> | ||||
| <t> | ||||
| In cases in which the permissions bits are subject to | ||||
| change, the acl and dacl attributes | ||||
| <bcp14>MUST</bcp14> be modified such that the mode computed via the | ||||
| method in | ||||
| <xref target="computemode" format="default"/> | ||||
| yields the low-order nine bits (MODE4_R*, MODE4_W*, | ||||
| MODE4_X*) of the mode attribute as modified by the | ||||
| attribute change. The ACL attributes | ||||
| <bcp14>SHOULD</bcp14> also be modified such that: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| If MODE4_RGRP is not set, entities explicitly | ||||
| listed in the ACL other than OWNER@ and EVERYONE@ | ||||
| <bcp14>SHOULD NOT</bcp14> be granted ACE4_READ_DATA. | ||||
| </li> | ||||
| <li> | ||||
| If MODE4_WGRP is not set, entities explicitly | ||||
| listed in the ACL other than OWNER@ and | ||||
| EVERYONE@ <bcp14>SHOULD NOT</bcp14> be granted | ||||
| ACE4_WRITE_DATA or ACE4_APPEND_DATA. | ||||
| </li> | ||||
| <li> | ||||
| If MODE4_XGRP is not set, entities explicitly | ||||
| listed in the ACL other than OWNER@ and EVERYONE@ | ||||
| <bcp14>SHOULD NOT</bcp14> be granted ACE4_EXECUTE. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| Access mask bits other than those listed above, appearing | ||||
| in ALLOW ACEs, <bcp14>MAY</bcp14> also be disabled. | ||||
| </t> | ||||
| <t> | ||||
| Note that ACEs with the flag ACE4_INHERIT_ONLY_ACE set do | ||||
| not affect the permissions of the ACL itself, nor do ACEs | ||||
| of the type AUDIT and ALARM. As such, it is desirable to | ||||
| leave these ACEs unmodified when modifying the ACL | ||||
| attributes. | ||||
| </t> | ||||
| <t> | ||||
| Also note that the requirement may be met by | ||||
| discarding the acl and dacl, in favor of an ACL | ||||
| that represents the mode and only the mode. This is | ||||
| permitted, but it is preferable for a server to | ||||
| preserve as much of the ACL as possible without | ||||
| violating the above requirements. Discarding the | ||||
| ACL makes it effectively impossible for a file | ||||
| created with a mode attribute to inherit an ACL | ||||
| (see <xref target="aclcreate" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="settingacl" numbered="true" toc="default"> | ||||
| <name>Setting ACL and Not Mode</name> | ||||
| <t> | ||||
| When setting the acl or dacl and not setting the | ||||
| mode or mode_set_masked attributes, the permission | ||||
| bits of the mode need to be derived from the ACL. | ||||
| In this case, the ACL attribute <bcp14>SHOULD</bcp14> be set as | ||||
| given. The nine low-order bits of the mode | ||||
| attribute (MODE4_R*, MODE4_W*, MODE4_X*) <bcp14>MUST</bcp14> be | ||||
| modified to match the result of the method in | ||||
| <xref target="computemode" format="default"/>. The three high-order bits | ||||
| of the mode (MODE4_SUID, MODE4_SGID, MODE4_SVTX) | ||||
| <bcp14>SHOULD</bcp14> remain unchanged. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="setboth" numbered="true" toc="default"> | ||||
| <name>Setting Both ACL and Mode</name> | ||||
| <t> | ||||
| When setting both the mode (includes use of either the | ||||
| mode attribute or the mode_set_masked attribute) | ||||
| and the acl or dacl attributes in the | ||||
| same operation, the attributes <bcp14>MUST</bcp14> be applied in this | ||||
| order: mode (or mode_set_masked), then ACL. The | ||||
| mode-related attribute is set as given, | ||||
| then the ACL attribute is set as given, possibly changing | ||||
| the final mode, as described above in | ||||
| <xref target="settingacl" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Retrieving the Mode and/or ACL Attributes</name> | ||||
| <t> | ||||
| This section applies only to servers that support both the | ||||
| mode and ACL attributes. | ||||
| </t> | ||||
| <t> | ||||
| Some server implementations may have a concept of | ||||
| "objects without ACLs", meaning that all permissions | ||||
| are granted and denied according to the mode attribute and | ||||
| that no ACL attribute is stored for that object. If an ACL | ||||
| attribute is requested of such a server, the server <bcp14>SHOULD</bcp14> | ||||
| return an ACL that does not conflict with the mode; that is to | ||||
| say, the ACL returned <bcp14>SHOULD</bcp14> represent the nine low-order bits | ||||
| of the mode attribute (MODE4_R*, MODE4_W*, MODE4_X*) as | ||||
| described in <xref target="computemode" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| For other server implementations, the ACL attribute is always | ||||
| present for every object. Such servers <bcp14>SHOULD</bcp14> store at least | ||||
| the three high-order bits of the mode attribute (MODE4_SUID, | ||||
| MODE4_SGID, MODE4_SVTX). The server <bcp14>SHOULD</bcp14> return a mode | ||||
| attribute if one is requested, and the low-order nine bits of | ||||
| the mode (MODE4_R*, MODE4_W*, MODE4_X*) <bcp14>MUST</bcp14> match the result | ||||
| of applying the method in | ||||
| <xref target="computemode" format="default"/> to the ACL attribute. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="aclcreate" numbered="true" toc="default"> | ||||
| <name>Creating New Objects</name> | ||||
| <t> | ||||
| If a server supports any ACL attributes, it may use the ACL | ||||
| attributes on the parent directory to compute an initial ACL | ||||
| attribute for a newly created object. This will be referred to | ||||
| as the inherited ACL within this section. The act of adding | ||||
| one or more ACEs to the inherited ACL that are based upon ACEs | ||||
| in the parent directory's ACL will be referred to as | ||||
| inheriting an ACE within this section. | ||||
| </t> | ||||
| <t> | ||||
| Implementors should standardize what the behavior of CREATE | ||||
| and OPEN must be depending on the presence or absence of the | ||||
| mode and ACL attributes. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| <t>If just the mode is given in the call: | ||||
| </t> | ||||
| <t> In this case, inheritance | ||||
| <bcp14>SHOULD</bcp14> take place, but the mode <bcp14>MUST</bcp14> be applied to the | ||||
| inherited ACL as described in <xref target="setmode" format="default"/>, thereby modifying the ACL. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t>If just the ACL is given in the call: | ||||
| </t> | ||||
| <t> | ||||
| In this case, inheritance <bcp14>SHOULD NOT</bcp14> take place, and | ||||
| the ACL as defined in the CREATE or OPEN will be set | ||||
| without modification, and the mode modified as in | ||||
| <xref target="settingacl" format="default"/>. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t>If both mode and ACL are given in the call: | ||||
| </t> | ||||
| <t> In this case, inheritance | ||||
| <bcp14>SHOULD NOT</bcp14> take place, and both attributes will be set | ||||
| as described in <xref target="setboth" format="default"/>. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| If neither mode nor ACL is given in the call: | ||||
| </t> | ||||
| <t> | ||||
| In the case where an object is being created without | ||||
| any initial attributes at all, e.g., an OPEN operation | ||||
| with an opentype4 of OPEN4_CREATE and a createmode4 of | ||||
| EXCLUSIVE4, inheritance <bcp14>SHOULD NOT</bcp14> take place (note that | ||||
| EXCLUSIVE4_1 is a better choice of createmode4, since it | ||||
| does permit initial attributes). | ||||
| Instead, the server <bcp14>SHOULD</bcp14> set permissions to deny all | ||||
| access to the newly created object. It is expected | ||||
| that the appropriate client will set the desired | ||||
| attributes in a subsequent SETATTR operation, and the | ||||
| server <bcp14>SHOULD</bcp14> allow that operation to succeed, | ||||
| regardless of what permissions the object is created | ||||
| with. For example, an empty ACL denies all | ||||
| permissions, but the server should allow the owner's | ||||
| SETATTR to succeed even though WRITE_ACL is implicitly | ||||
| denied. | ||||
| </t> | ||||
| <t> | ||||
| In other cases, inheritance <bcp14>SHOULD</bcp14> take place, and no | ||||
| modifications to the ACL will happen. The mode | ||||
| attribute, if supported, <bcp14>MUST</bcp14> be as computed in | ||||
| <xref target="computemode" format="default"/>, with the MODE4_SUID, | ||||
| MODE4_SGID, and MODE4_SVTX bits clear. | ||||
| If no inheritable ACEs exist on the parent directory, | ||||
| the rules for creating acl, dacl, or sacl attributes | ||||
| are implementation defined. | ||||
| If either the dacl or sacl attribute is supported, | ||||
| then the ACL4_DEFAULTED flag <bcp14>SHOULD</bcp14> be set on the | ||||
| newly created attributes. | ||||
| </t> | ||||
| </li> | ||||
| </ol> | ||||
| <section anchor="inheritreq" numbered="true" toc="default"> | ||||
| <name>The Inherited ACL</name> | ||||
| <t> | ||||
| If the object being created is not a directory, the | ||||
| inherited ACL <bcp14>SHOULD NOT</bcp14> inherit ACEs from the parent | ||||
| directory ACL unless the ACE4_FILE_INHERIT_FLAG is set. | ||||
| </t> | ||||
| <t> | ||||
| If the object being created is a directory, the inherited | ||||
| ACL should inherit all inheritable ACEs from the parent | ||||
| directory, that is, those that have the ACE4_FILE_INHERIT_ACE or | ||||
| ACE4_DIRECTORY_INHERIT_ACE flag set. | ||||
| If the inheritable | ||||
| ACE has ACE4_FILE_INHERIT_ACE set but | ||||
| ACE4_DIRECTORY_INHERIT_ACE is clear, the inherited ACE on | ||||
| the newly created directory <bcp14>MUST</bcp14> have the | ||||
| ACE4_INHERIT_ONLY_ACE flag set to prevent the directory | ||||
| from being affected by ACEs meant for non-directories. | ||||
| </t> | ||||
| <t> | ||||
| When a new directory is created, the server <bcp14>MAY</bcp14> split | ||||
| any inherited ACE that is both inheritable and effective | ||||
| (in other words, that has neither ACE4_INHERIT_ONLY_ACE | ||||
| nor ACE4_NO_PROPAGATE_INHERIT_ACE set), into two ACEs, | ||||
| one with no inheritance flags and one with | ||||
| ACE4_INHERIT_ONLY_ACE set. (In the case of a dacl or | ||||
| sacl attribute, both of those ACEs <bcp14>SHOULD</bcp14> also have the | ||||
| ACE4_INHERITED_ACE flag set.) This makes it simpler to | ||||
| modify the effective permissions on the directory | ||||
| without modifying the ACE that is to be inherited to the | ||||
| new directory's children. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="auto_inherit" numbered="true" toc="default"> | ||||
| <name>Automatic Inheritance</name> | ||||
| <t> | ||||
| The acl attribute consists only of an array of ACEs, but | ||||
| the <xref target="attrdef_sacl" format="default">sacl</xref> | ||||
| and <xref target="attrdef_dacl" format="default">dacl</xref> attributes | ||||
| also include an additional flag field. | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct nfsacl41 { | ||||
| aclflag4 na41_flag; | ||||
| nfsace4 na41_aces<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The flag field | ||||
| applies to the entire sacl or dacl; three flag values are | ||||
| defined: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const ACL4_AUTO_INHERIT = 0x00000001; | ||||
| const ACL4_PROTECTED = 0x00000002; | ||||
| const ACL4_DEFAULTED = 0x00000004; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| and all other bits must be cleared. The | ||||
| ACE4_INHERITED_ACE flag may be set in the ACEs of the sacl | ||||
| or dacl (whereas it must always be cleared in the acl). | ||||
| </t> | ||||
| <t> | ||||
| Together these features allow a server to support automatic | ||||
| inheritance, which we now explain in more detail. | ||||
| </t> | ||||
| <t> | ||||
| Inheritable ACEs are normally inherited by child objects only | ||||
| at the time that the child objects are created; later | ||||
| modifications to inheritable ACEs do not result in | ||||
| modifications to inherited ACEs on descendants. | ||||
| </t> | ||||
| <t> | ||||
| However, the dacl and sacl provide an <bcp14>OPTIONAL</bcp14> mechanism | ||||
| that allows a client application to propagate changes to | ||||
| inheritable ACEs to an entire directory hierarchy. | ||||
| </t> | ||||
| <t> | ||||
| A server that supports this performs inheritance at object | ||||
| creation time in the normal way, and <bcp14>SHOULD</bcp14> set the | ||||
| ACE4_INHERITED_ACE flag on any inherited ACEs as they are | ||||
| added to the new object. | ||||
| </t> | ||||
| <t> | ||||
| A client application such as an ACL editor may then propagate | ||||
| changes to inheritable ACEs on a directory by recursively | ||||
| traversing that directory's descendants and modifying each ACL | ||||
| encountered to remove any ACEs with the ACE4_INHERITED_ACE flag | ||||
| and to replace them by the new inheritable ACEs (also with the | ||||
| ACE4_INHERITED_ACE flag set). It uses the existing ACE | ||||
| inheritance flags in the obvious way to decide which ACEs to | ||||
| propagate. (Note that it may encounter further inheritable | ||||
| ACEs when descending the directory hierarchy and that those | ||||
| will also need to be taken into account when propagating | ||||
| inheritable ACEs to further descendants.) | ||||
| </t> | ||||
| <t> | ||||
| The reach of this propagation may be limited in two ways: | ||||
| first, automatic inheritance is not performed from any | ||||
| directory ACL that has the ACL4_AUTO_INHERIT flag | ||||
| cleared; and second, automatic inheritance stops wherever | ||||
| an ACL with the ACL4_PROTECTED flag is set, preventing | ||||
| modification of that ACL and also (if the ACL is set on | ||||
| a directory) of the ACL on any of the object's descendants. | ||||
| </t> | ||||
| <t> | ||||
| This propagation is performed independently for the sacl | ||||
| and the dacl attributes; thus, the ACL4_AUTO_INHERIT and | ||||
| ACL4_PROTECTED flags may be independently set for the sacl | ||||
| and the dacl, and propagation of one type of acl may continue | ||||
| down a hierarchy even where propagation of the other acl has | ||||
| stopped. | ||||
| </t> | ||||
| <t> | ||||
| New objects should be created with a dacl and a sacl that | ||||
| both have the ACL4_PROTECTED flag cleared and the | ||||
| ACL4_AUTO_INHERIT flag set to the same value as that on, | ||||
| respectively, the sacl or dacl of the parent object. | ||||
| </t> | ||||
| <t> | ||||
| Both the dacl and sacl attributes are <bcp14>RECOMMENDED</bcp14>, and a server | ||||
| may support one without supporting the other. | ||||
| </t> | ||||
| <t> | ||||
| A server that supports both the old acl attribute and | ||||
| one or both of the new dacl or sacl attributes must do so | ||||
| in such a way as to keep all three attributes consistent | ||||
| with each other. Thus, the ACEs reported in the acl attribute | ||||
| should be the union of the ACEs reported in the dacl and | ||||
| sacl attributes, except that the ACE4_INHERITED_ACE flag must | ||||
| be cleared from the ACEs in the acl. And of course a | ||||
| client that queries only the acl will be unable to determine | ||||
| the values of the sacl or dacl flag fields. | ||||
| </t> | ||||
| <t> | ||||
| When a client performs a SETATTR for the acl attribute, | ||||
| the server <bcp14>SHOULD</bcp14> set the ACL4_PROTECTED flag to true on | ||||
| both the sacl and the dacl. By using the acl attribute, | ||||
| as opposed to the dacl or sacl attributes, the client signals | ||||
| that it may not understand automatic inheritance, and thus | ||||
| cannot be trusted to set an ACL for which automatic | ||||
| inheritance would make sense. | ||||
| </t> | ||||
| <t> | ||||
| When a client application queries an ACL, modifies it, and sets | ||||
| it again, it should leave any ACEs marked with | ||||
| ACE4_INHERITED_ACE unchanged, in their original order, at the | ||||
| end of the ACL. If the application is unable to do this, it | ||||
| should set the ACL4_PROTECTED flag. This behavior | ||||
| is not enforced by servers, but violations of this rule may | ||||
| lead to unexpected results when applications perform automatic | ||||
| inheritance. | ||||
| </t> | ||||
| <t> | ||||
| If a server also supports the mode attribute, it <bcp14>SHOULD</bcp14> set the | ||||
| mode in such a way that leaves inherited ACEs unchanged, in | ||||
| their original order, at the end of the ACL. If it is unable | ||||
| to do so, it <bcp14>SHOULD</bcp14> set the ACL4_PROTECTED flag on the file's | ||||
| dacl. | ||||
| </t> | ||||
| <t>Finally, in the case where the request that creates a new file | ||||
| or directory does not also set permissions for that file or | ||||
| directory, and there are also no ACEs to inherit from the | ||||
| parent's directory, then the server's choice of ACL for the new | ||||
| object is implementation-dependent. In this case, the server | ||||
| <bcp14>SHOULD</bcp14> set the ACL4_DEFAULTED flag on the ACL it chooses for | ||||
| the new object. An application performing automatic | ||||
| inheritance takes the ACL4_DEFAULTED flag as a sign that the | ||||
| ACL should be completely replaced by one generated using the | ||||
| automatic inheritance rules. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="single_server_namespace" numbered="true" toc="default"> | ||||
| <name>Single-Server Namespace</name> | ||||
| <t> | ||||
| This section describes the NFSv4 single-server namespace. | ||||
| Single-server namespaces may be presented directly to clients, | ||||
| or they may be used as a basis to form larger multi-server | ||||
| namespaces (e.g., site-wide or organization-wide) to be presented | ||||
| to clients, as described in <xref target="NEW11" format="default"/>. | ||||
| </t> | ||||
| <section anchor="server_exports" numbered="true" toc="default"> | ||||
| <name>Server Exports</name> | ||||
| <t> | ||||
| On a UNIX server, the namespace describes all the files reachable by | ||||
| pathnames under the root directory or "/". On a Windows server, the | ||||
| namespace constitutes all the files on disks named by mapped disk | ||||
| letters. NFS server administrators rarely make the entire server's | ||||
| file system namespace available to NFS clients. More often, portions | ||||
| of the namespace are made available via an "export" feature. In | ||||
| previous versions of the NFS protocol, the root filehandle for each | ||||
| export is obtained through the MOUNT protocol; the client sent a | ||||
| string that identified the export name within the namespace and | ||||
| the server returned the root filehandle | ||||
| for that export. The MOUNT protocol also provided an EXPORTS | ||||
| procedure that enumerated the server's exports. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="browsing_exports" numbered="true" toc="default"> | ||||
| <name>Browsing Exports</name> | ||||
| <t> | ||||
| The NFSv4.1 protocol provides a root filehandle that clients can | ||||
| use to obtain filehandles for the exports of a particular server, | ||||
| via a series of LOOKUP operations within a COMPOUND, to traverse | ||||
| a path. A common user experience is to use a graphical user interface | ||||
| (perhaps a file "Open" dialog window) to find a file via progressive | ||||
| browsing through a directory tree. The client must be able to move | ||||
| from one export to another export via single-component, progressive | ||||
| LOOKUP operations. | ||||
| </t> | ||||
| <t> | ||||
| This style of browsing is not well supported by the NFSv3 protocol. In NFSv3, the client expects all | ||||
| LOOKUP operations to remain | ||||
| within a single server file system. For example, the device attribute | ||||
| will not change. This prevents a client from taking namespace paths | ||||
| that span exports. | ||||
| </t> | ||||
| <t> | ||||
| In the case of NFSv3, an automounter on the client | ||||
| can obtain a snapshot of the server's namespace | ||||
| using the EXPORTS procedure of the MOUNT protocol. | ||||
| If it understands the server's pathname syntax, | ||||
| it can create an image of the server's namespace | ||||
| on the client. The parts of the namespace that | ||||
| are not exported by the server are filled in | ||||
| with directories that might be constructed similarly | ||||
| to an NFSv4.1 "pseudo file system" (see <xref target="server_pseudo_file_system" format="default"/>) that | ||||
| allows the user to browse from one mounted file | ||||
| system to another. There is a drawback to this | ||||
| representation of the server's namespace on the | ||||
| client: it is static. If the server administrator | ||||
| adds a new export, the client will be unaware of it. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="server_pseudo_file_system" numbered="true" toc="default"> | ||||
| <name>Server Pseudo File System</name> | ||||
| <t> | ||||
| NFSv4.1 servers avoid this namespace inconsistency by | ||||
| presenting all the exports for a given server within the | ||||
| framework of a single namespace for that server. | ||||
| An NFSv4.1 client uses LOOKUP and READDIR | ||||
| operations to browse seamlessly from one export to another. | ||||
| </t> | ||||
| <t> | ||||
| Where there are portions of the server namespace that are not | ||||
| exported, clients require some way of traversing those portions | ||||
| to reach actual exported file systems. A technique that servers | ||||
| may use to provide for this is to bridge the unexported portion of | ||||
| the namespace via a | ||||
| "pseudo file system" that provides a view of exported directories | ||||
| only. A pseudo file system has a unique fsid and behaves like a | ||||
| normal, read-only file system. | ||||
| </t> | ||||
| <t> | ||||
| Based on the construction of the server's namespace, it is possible | ||||
| that multiple pseudo file systems may exist. For example, | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| /a pseudo file system | ||||
| /a/b real file system | ||||
| /a/b/c pseudo file system | ||||
| /a/b/c/d real file system | ||||
| ]]></artwork> | ||||
| <t> | ||||
| Each of the pseudo file systems is considered a separate entity and | ||||
| therefore <bcp14>MUST</bcp14> have its own fsid, unique among all the fsids for that | ||||
| server. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Multiple Roots</name> | ||||
| <t> | ||||
| Certain operating environments are sometimes described as | ||||
| having "multiple roots". In such environments, individual file | ||||
| systems are commonly represented by disk or volume names. | ||||
| NFSv4 servers for these platforms can construct a pseudo file | ||||
| system above these root names so that disk letters or volume names are | ||||
| simply directory names in the pseudo root. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="pseudo_fs_volatility" numbered="true" toc="default"> | ||||
| <name>Filehandle Volatility</name> | ||||
| <t> | ||||
| The nature of the server's pseudo file system is that it is a logical | ||||
| representation of file system(s) available from the server. | ||||
| Therefore, the pseudo file system is most likely constructed | ||||
| dynamically when the server is first instantiated. It is expected | ||||
| that the pseudo file system may not have an on-disk counterpart from | ||||
| which persistent filehandles could be constructed. Even though it is | ||||
| preferable that the server provide persistent filehandles for the | ||||
| pseudo file system, the NFS client should expect that pseudo file | ||||
| system filehandles are volatile. This can be confirmed by checking | ||||
| the associated "fh_expire_type" attribute for those filehandles in | ||||
| question. If the filehandles are volatile, the NFS client must be | ||||
| prepared to recover a filehandle value (e.g., with a series of | ||||
| LOOKUP operations) when receiving an error of NFS4ERR_FHEXPIRED. | ||||
| </t> | ||||
| <t> | ||||
| Because it is quite likely that servers will implement pseudo | ||||
| file systems using volatile filehandles, clients need to be | ||||
| prepared for them, rather than assuming that all filehandles | ||||
| will be persistent. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Exported Root</name> | ||||
| <t> | ||||
| If the server's root file system is exported, one might conclude that | ||||
| a pseudo file system is unneeded. This is not necessarily so. Assume the | ||||
| following file systems on a server: | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| / fs1 (exported) | ||||
| /a fs2 (not exported) | ||||
| /a/b fs3 (exported)]]></artwork> | ||||
| <t> | ||||
| Because fs2 is not exported, fs3 cannot be reached with simple | ||||
| LOOKUPs. The server must bridge the gap with a pseudo file system. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Mount Point Crossing</name> | ||||
| <t> | ||||
| The server file system environment may be constructed in such a way | ||||
| that one file system contains a directory that is 'covered' or | ||||
| mounted upon by a second file system. For example: | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| /a/b (file system 1) | ||||
| /a/b/c/d (file system 2)]]></artwork> | ||||
| <t> | ||||
| The pseudo file system for this server may be constructed to look | ||||
| like: | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| / (place holder/not exported) | ||||
| /a/b (file system 1) | ||||
| /a/b/c/d (file system 2)]]></artwork> | ||||
| <t> | ||||
| It is the server's responsibility to present the pseudo file system | ||||
| that is complete to the client. If the client sends a LOOKUP request | ||||
| for the path /a/b/c/d, the server's response is the filehandle of | ||||
| the root of the file system /a/b/c/d. In previous versions of the | ||||
| NFS protocol, | ||||
| the server would respond with the filehandle of directory | ||||
| /a/b/c/d within the file system /a/b. | ||||
| </t> | ||||
| <t> | ||||
| The NFS client will be able to determine if it crosses a server mount | ||||
| point by a change in the value of the "fsid" attribute. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Security Policy and Namespace Presentation</name> | ||||
| <t> | ||||
| Because NFSv4 clients possess the ability to change the security | ||||
| mechanisms used, after determining what is allowed, | ||||
| by using SECINFO and SECINFO_NONAME, the server | ||||
| <bcp14>SHOULD NOT</bcp14> present a different view of the namespace based on | ||||
| the security mechanism being used by a client. Instead, it | ||||
| should present a consistent view and return NFS4ERR_WRONGSEC | ||||
| if an attempt is made to access data with an inappropriate | ||||
| security mechanism. | ||||
| </t> | ||||
| <t> | ||||
| If security considerations make it necessary to hide the existence | ||||
| of a particular file system, as opposed to all of the data within | ||||
| it, the server can apply the security policy of | ||||
| a shared resource in the server's namespace to components of the | ||||
| resource's ancestors. For example: | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| / (place holder/not exported) | ||||
| /a/b (file system 1) | ||||
| /a/b/MySecretProject (file system 2)]]></artwork> | ||||
| <t> | ||||
| The /a/b/MySecretProject directory is a real file system and | ||||
| is the shared resource. | ||||
| Suppose the security policy for /a/b/MySecretProject is Kerberos | ||||
| with integrity and it is desired to limit knowledge of the existence | ||||
| of this file system. In this case, the | ||||
| server should apply the same security policy to /a/b. This allows | ||||
| for knowledge of the existence of a file system to be secured | ||||
| when desirable. | ||||
| </t> | ||||
| <t> | ||||
| For the case of the use of multiple, disjoint security mechanisms in | ||||
| the server's resources, applying that sort of policy would result | ||||
| in the higher-level file system not being accessible using any | ||||
| security flavor. | ||||
| Therefore, that sort of configuration is not compatible | ||||
| with hiding the existence (as opposed to the contents) from clients | ||||
| using multiple disjoint sets of security flavors. | ||||
| </t> | ||||
| <t> | ||||
| In other circumstances, a desirable policy is for the security of a | ||||
| particular object in the | ||||
| server's namespace to include the union of all security mechanisms of | ||||
| all direct descendants. A common and convenient practice, unless | ||||
| strong security requirements dictate otherwise, is to make the | ||||
| entire the pseudo file system accessible by all of the valid security | ||||
| mechanisms. | ||||
| </t> | ||||
| <t> | ||||
| Where there is concern about the security of data on the network, | ||||
| clients should use strong security mechanisms to access the pseudo | ||||
| file system in order to prevent man-in-the-middle attacks. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>State Management</name> | ||||
| <t> | ||||
| Integrating locking into the NFS protocol necessarily causes it to be | ||||
| stateful. With the inclusion of such features as share reservations, | ||||
| file and directory delegations, recallable layouts, and support for | ||||
| mandatory byte-range locking, the protocol becomes substantially more | ||||
| dependent on proper management of state than the traditional | ||||
| combination of NFS and NLM (Network Lock Manager) | ||||
| <xref target="xnfs" format="default"/>. These features include expanded | ||||
| locking facilities, which provide some measure of inter-client | ||||
| exclusion, but the state also offers | ||||
| features not readily providable using a stateless model. | ||||
| There are three components to | ||||
| making this state manageable: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| clear division between client and server | ||||
| </li> | ||||
| <li> | ||||
| ability to reliably detect inconsistency in state between client | ||||
| and server | ||||
| </li> | ||||
| <li> | ||||
| simple and robust recovery mechanisms | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| In this model, the server owns the state information. The client | ||||
| requests changes in locks and the server responds with the changes | ||||
| made. Non-client-initiated changes in locking state are infrequent. | ||||
| The client receives prompt notification of such changes and can adjust | ||||
| its view of the locking state to reflect the server's changes. | ||||
| </t> | ||||
| <t> | ||||
| Individual pieces of state created by the server and passed to the | ||||
| client at its request are represented by 128-bit stateids. These | ||||
| stateids may represent a particular open file, a set of | ||||
| byte-range locks held | ||||
| by a particular owner, or a recallable delegation of privileges | ||||
| to access a file in particular ways or at a particular location. | ||||
| </t> | ||||
| <t> | ||||
| In all cases, there is a transition from the most general | ||||
| information that represents a client as a whole to the eventual | ||||
| lightweight stateid used for most client and server | ||||
| locking interactions. The details of this transition will vary | ||||
| with the type of object but it always starts with a client ID. | ||||
| </t> | ||||
| <section anchor="client_id" numbered="true" toc="default"> | ||||
| <name>Client and Session ID</name> | ||||
| <t> | ||||
| A client must establish a client ID (see <xref target="Client_Identifiers" format="default"/>) | ||||
| and then one or more sessionids (see <xref target="Session" format="default"/>) before | ||||
| performing any operations to open, byte-range lock, delegate, or obtain | ||||
| a layout for a file object. | ||||
| Each session ID is associated with a specific client ID, and thus | ||||
| serves as a shorthand reference to an NFSv4.1 client. | ||||
| </t> | ||||
| <t> | ||||
| For some types of locking interactions, the client will represent | ||||
| some number of internal locking entities called "owners", which | ||||
| normally correspond to processes internal to the client. For | ||||
| other types of locking-related objects, such as delegations and | ||||
| layouts, no such intermediate entities are provided for, and the | ||||
| locking-related objects are considered to be transferred | ||||
| directly between the server and a unitary client. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Client and Session ID" --> | ||||
| <section anchor="stateid" numbered="true" toc="default"> | ||||
| <name>Stateid Definition</name> | ||||
| <t> | ||||
| When the server grants a lock of any type (including opens, | ||||
| byte-range locks, delegations, and layouts), it responds with a | ||||
| unique stateid that represents a set of locks (often a single | ||||
| lock) for the same file, of the same type, and sharing the same | ||||
| ownership characteristics. Thus, opens of the same file by | ||||
| different open-owners each have an identifying stateid. Similarly, | ||||
| each set of byte-range locks on a file owned by a specific lock-owner | ||||
| has its own | ||||
| identifying stateid. Delegations and layouts also have | ||||
| associated stateids by which they may be referenced. | ||||
| The stateid is used as a shorthand reference to a lock or set | ||||
| of locks, and given a stateid, the server can determine the associated | ||||
| state-owner or state-owners (in the case of an open-owner/lock-owner pair) | ||||
| and the associated filehandle. When stateids are used, the current | ||||
| filehandle must be the one associated with that stateid. | ||||
| </t> | ||||
| <t> | ||||
| All stateids associated with a given client ID are associated with | ||||
| a common lease that represents the claim of those stateids | ||||
| and the objects they represent to be maintained | ||||
| by the server. See <xref target="lease_renewal" format="default"/> for a | ||||
| discussion of the lease. | ||||
| </t> | ||||
| <t> | ||||
| The server may assign stateids independently for different clients. | ||||
| A stateid with the same bit pattern for one client may designate | ||||
| an entirely different set of locks for a different client. The | ||||
| stateid is always interpreted with respect to the client ID associated | ||||
| with the current session. Stateids apply to all sessions associated | ||||
| with the given client ID, and the client may use a stateid obtained from | ||||
| one session on another session associated with the same client ID. | ||||
| </t> | ||||
| <section anchor="stateid_types" numbered="true" toc="default"> | ||||
| <name>Stateid Types</name> | ||||
| <t> | ||||
| With the exception of special stateids (see <xref target="special_stateid" format="default"/>), | ||||
| each stateid | ||||
| represents locking objects of one of a set of types defined | ||||
| by the NFSv4.1 protocol. Note that in all these cases, where | ||||
| we speak of guarantee, it is understood there are | ||||
| situations such as a client restart, or lock revocation, | ||||
| that allow the guarantee to be voided. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| Stateids may represent opens of files. | ||||
| </t> | ||||
| <t> | ||||
| Each stateid in this case represents the OPEN state for a | ||||
| given client ID/open-owner/filehandle triple. Such | ||||
| stateids are subject to change (with consequent | ||||
| incrementing of the stateid's seqid) in response to OPENs that | ||||
| result in upgrade and OPEN_DOWNGRADE operations. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Stateids may represent sets of byte-range locks. | ||||
| </t> | ||||
| <t> | ||||
| All locks held on a particular file by a particular owner and | ||||
| gotten under the aegis of a particular open file | ||||
| are associated with a single stateid with the seqid | ||||
| being incremented whenever LOCK and LOCKU operations affect that | ||||
| set of locks. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Stateids may represent file delegations, which are | ||||
| recallable guarantees by the server to the client | ||||
| that other clients will not reference or | ||||
| modify a particular file, until the delegation | ||||
| is returned. In NFSv4.1, file delegations may be | ||||
| obtained on both regular and non-regular files. | ||||
| </t> | ||||
| <t> | ||||
| A stateid represents a single delegation held by | ||||
| a client for a particular filehandle. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Stateids may represent directory delegations, which | ||||
| are recallable guarantees by the server to the client | ||||
| that other clients will not modify the directory, | ||||
| until the delegation is returned. | ||||
| </t> | ||||
| <t> | ||||
| A stateid represents a single delegation held by | ||||
| a client for a particular directory filehandle. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Stateids may represent layouts, which are recallable | ||||
| guarantees by the server to the client that particular | ||||
| files may be accessed via an alternate data access | ||||
| protocol at specific locations. Such access is | ||||
| limited to particular sets of byte-ranges and may | ||||
| proceed until those byte-ranges are reduced or the | ||||
| layout is returned. | ||||
| </t> | ||||
| <t> | ||||
| A stateid represents the set of all layouts held by a particular | ||||
| client for a particular filehandle with a given | ||||
| layout type. The seqid is updated as the layouts | ||||
| of that set of byte-ranges change, via layout stateid changing operations such | ||||
| as LAYOUTGET and LAYOUTRETURN. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="stateid_structure" numbered="true" toc="default"> | ||||
| <name>Stateid Structure</name> | ||||
| <t> | ||||
| Stateids are divided into two fields, a 96-bit | ||||
| "other" field identifying the specific set | ||||
| of locks and a 32-bit "seqid" sequence value. | ||||
| Except in the case of special stateids | ||||
| (see <xref target="special_stateid" format="default"/>), | ||||
| a particular value of the | ||||
| "other" field denotes a | ||||
| set of locks of the same type (for example, | ||||
| byte-range locks, opens, delegations, or layouts), | ||||
| for a specific file or directory, and sharing | ||||
| the same ownership characteristics. The seqid | ||||
| designates a specific instance of such a set of | ||||
| locks, and is incremented to indicate changes in | ||||
| such a set of locks, either by the addition or | ||||
| deletion of locks from the set, a change in the | ||||
| byte-range they apply to, or an upgrade or downgrade | ||||
| in the type of one or more locks. | ||||
| </t> | ||||
| <t> | ||||
| When such a set of locks is first created, the server returns a | ||||
| stateid with seqid value of one. On subsequent | ||||
| operations that modify the set of locks, the server | ||||
| is required to increment the "seqid" field by one | ||||
| whenever it returns a stateid for the same | ||||
| state-owner/file/type combination and there is some | ||||
| change in the set of locks actually designated. | ||||
| In this case, the server will return a stateid with an "other" field | ||||
| the same as previously used for that | ||||
| state-owner/file/type combination, with an | ||||
| incremented "seqid" field. | ||||
| This pattern continues until the seqid is incremented | ||||
| past NFS4_UINT32_MAX, and one | ||||
| (not zero) is the next seqid value. | ||||
| </t> | ||||
| <t> | ||||
| The purpose of the incrementing of the seqid | ||||
| is to allow the server to | ||||
| communicate to the client the order in which | ||||
| operations that modified locking state associated | ||||
| with a stateid have been processed and to make | ||||
| it possible for the client to send requests | ||||
| that are conditional on the set of locks not | ||||
| having changed since the stateid in question | ||||
| was returned. | ||||
| </t> | ||||
| <t> | ||||
| Except for layout stateids (<xref target="layout_stateid" format="default"/>), | ||||
| when a client sends a stateid to the server, it has two | ||||
| choices with regard to the seqid sent. It may set the seqid | ||||
| to zero to indicate to the server that it wishes the most | ||||
| up-to-date seqid for that stateid's "other" field to be | ||||
| used. This would be the common choice in the case of a | ||||
| stateid sent with a READ or WRITE operation. It also may | ||||
| set a non-zero value, in which case the server checks if that | ||||
| seqid is the correct one. In that case, the server is | ||||
| required to return NFS4ERR_OLD_STATEID if the seqid is lower | ||||
| than the most current value and NFS4ERR_BAD_STATEID if the | ||||
| seqid is greater than the most current value. This would be | ||||
| the common choice in the case of stateids sent with a CLOSE | ||||
| or OPEN_DOWNGRADE. Because OPENs may be sent in parallel | ||||
| for the same owner, a client might close a file without | ||||
| knowing that an OPEN upgrade had been done by the server, | ||||
| changing the lock in question. If CLOSE were sent with a | ||||
| zero seqid, the OPEN upgrade would be cancelled before the | ||||
| client even received an indication that an upgrade had | ||||
| happened. | ||||
| </t> | ||||
| <t> | ||||
| When a stateid is sent by the server to the client as part of | ||||
| a callback operation, it is not subject to checking for | ||||
| a current seqid and returning NFS4ERR_OLD_STATEID. This | ||||
| is because the client is not in a position to know the | ||||
| most up-to-date seqid and thus cannot verify it. Unless | ||||
| specially noted, the seqid value for a stateid sent by the | ||||
| server to the client as part of a callback is required | ||||
| to be zero with NFS4ERR_BAD_STATEID returned if it is | ||||
| not. | ||||
| </t> | ||||
| <t> | ||||
| In making comparisons between seqids, both by the client | ||||
| in determining the order of operations and by the server | ||||
| in determining whether the NFS4ERR_OLD_STATEID is to be | ||||
| returned, the possibility of the seqid being swapped | ||||
| around past the NFS4_UINT32_MAX value needs to be taken | ||||
| into account. When two seqid values are being compared, | ||||
| the total count of slots for all sessions associated | ||||
| with the current client is used to do this. When one | ||||
| seqid value is less than this total slot count and | ||||
| another seqid value is greater than NFS4_UINT32_MAX | ||||
| minus the total slot count, the former is to be treated | ||||
| as lower than the latter, despite the fact that it is | ||||
| numerically greater. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Stateid Structure" --> | ||||
| <section anchor="special_stateid" numbered="true" toc="default"> | ||||
| <name>Special Stateids</name> | ||||
| <t> | ||||
| Stateid values whose "other" field is either all zeros or all | ||||
| ones are reserved. They may not be assigned by the server but | ||||
| have special meanings defined by the protocol. The particular | ||||
| meaning depends on whether the "other" field is all zeros or | ||||
| all ones and the specific value of the "seqid" field. | ||||
| </t> | ||||
| <t> | ||||
| The following combinations of "other" and "seqid" are defined | ||||
| in NFSv4.1: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| When "other" and "seqid" are both zero, the | ||||
| stateid is treated as a special anonymous | ||||
| stateid, which can be used in READ, WRITE, | ||||
| and SETATTR requests to indicate the absence | ||||
| of any OPEN state associated with the | ||||
| request. When an anonymous stateid value is | ||||
| used and an existing open denies the form of | ||||
| access requested, then access will be denied | ||||
| to the request. This stateid <bcp14>MUST NOT</bcp14> be | ||||
| used on operations to data servers (<xref target="ds_ops" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| When "other" and "seqid" are both all ones, | ||||
| the stateid is a special READ bypass stateid. | ||||
| When this value is used in WRITE or SETATTR, | ||||
| it is treated like the anonymous value. | ||||
| When used in READ, the server <bcp14>MAY</bcp14> grant | ||||
| access, even if access would normally be | ||||
| denied to READ operations. This stateid <bcp14>MUST | ||||
| NOT</bcp14> be used on operations to data servers. | ||||
| </li> | ||||
| <li> | ||||
| When "other" is zero and "seqid" is one, | ||||
| the stateid represents the current stateid, | ||||
| which is whatever value is the last stateid | ||||
| returned by an operation within the COMPOUND. | ||||
| In the case of an OPEN, the stateid returned | ||||
| for the open file and not the delegation is | ||||
| used. The stateid passed to the operation in | ||||
| place of the special value has its "seqid" | ||||
| value set to zero, except when the current | ||||
| stateid is used by the operation CLOSE or | ||||
| OPEN_DOWNGRADE. If there is no operation | ||||
| in the COMPOUND that has returned a stateid | ||||
| value, the server <bcp14>MUST</bcp14> return the error | ||||
| NFS4ERR_BAD_STATEID. As illustrated in <xref target="csid_example4" format="default"/>, if the value of a | ||||
| current stateid is a special stateid and the | ||||
| stateid of an operation's arguments has | ||||
| "other" set to zero and "seqid" set to one, | ||||
| then the server <bcp14>MUST</bcp14> return the error | ||||
| NFS4ERR_BAD_STATEID. | ||||
| </li> | ||||
| <li> | ||||
| When "other" is zero and "seqid" is NFS4_UINT32_MAX, | ||||
| the stateid represents a reserved stateid | ||||
| value defined to be invalid. When this | ||||
| stateid is used, the server <bcp14>MUST</bcp14> return the error | ||||
| NFS4ERR_BAD_STATEID. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If a stateid value is used that has all zeros or all ones in the | ||||
| "other" field but does not match one of the cases above, the server | ||||
| <bcp14>MUST</bcp14> return the error NFS4ERR_BAD_STATEID. | ||||
| </t> | ||||
| <t> | ||||
| Special stateids, unlike other stateids, are not associated with | ||||
| individual client IDs or filehandles and can be used with all valid | ||||
| client IDs and filehandles. In the case of a special | ||||
| stateid designating the current stateid, the current stateid | ||||
| value substituted for the special stateid is associated with a | ||||
| particular client ID and filehandle, and so, if it is used | ||||
| where the current filehandle does not match that associated with the current | ||||
| stateid, the operation to which the stateid is passed will return | ||||
| NFS4ERR_BAD_STATEID. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Special Stateids" --> | ||||
| <section anchor="stateid_lifetime" numbered="true" toc="default"> | ||||
| <name>Stateid Lifetime and Validation</name> | ||||
| <t> | ||||
| Stateids must remain valid until either a client restart or a | ||||
| server restart or until the client returns all of the locks | ||||
| associated with the stateid by means of an operation such as | ||||
| CLOSE or DELEGRETURN. | ||||
| If the locks are lost due to revocation, as long | ||||
| as the client ID is valid, the stateid remains | ||||
| a valid designation of that revoked state until | ||||
| the client frees it by using FREE_STATEID. | ||||
| Stateids associated | ||||
| with byte-range locks are an exception. They remain valid even | ||||
| if a LOCKU frees all remaining locks, so long as the open file | ||||
| with which they are associated remains open, unless the client | ||||
| frees the stateids via the FREE_STATEID operation. | ||||
| </t> | ||||
| <t> | ||||
| It should be noted that there are situations in which the | ||||
| client's locks become invalid, without the client requesting | ||||
| they be returned. These include lease expiration and a number | ||||
| of forms of lock revocation within the lease period. It is | ||||
| important to note that in these situations, the stateid remains | ||||
| valid and the client can use it to determine the disposition of | ||||
| the associated lost locks. | ||||
| </t> | ||||
| <t> | ||||
| An "other" value must never be reused for a different purpose | ||||
| (i.e., different filehandle, owner, or type of locks) within the | ||||
| context of a single client ID. A server may retain the "other" | ||||
| value for the same purpose beyond the point where it may otherwise | ||||
| be freed, but if it does so, it must maintain "seqid" continuity | ||||
| with previous values. | ||||
| </t> | ||||
| <t> | ||||
| One mechanism that may be used to satisfy the requirement that the | ||||
| server recognize invalid and out-of-date stateids is for | ||||
| the server to divide the "other" field of the stateid into two | ||||
| fields. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| an index into a table of locking-state structures. | ||||
| </li> | ||||
| <li> | ||||
| a generation number that is incremented on each allocation | ||||
| of a table entry for a particular use. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| And then store in each table entry, | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| the client ID with which the stateid is associated. | ||||
| </li> | ||||
| <li> | ||||
| the current generation number for the (at most one) | ||||
| valid stateid sharing this index value. | ||||
| </li> | ||||
| <li> | ||||
| the filehandle of the file on which the locks are taken. | ||||
| </li> | ||||
| <li> | ||||
| an indication of the type of stateid (open, byte-range lock, | ||||
| file delegation, directory delegation, layout). | ||||
| </li> | ||||
| <li> | ||||
| the last "seqid" value returned corresponding to the current | ||||
| "other" value. | ||||
| </li> | ||||
| <li> | ||||
| an indication of the current status of the locks | ||||
| associated with this stateid, in particular, | ||||
| whether these have been revoked and if so, for what reason. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| With this information, an incoming stateid can be validated and | ||||
| the appropriate error returned when necessary. Special and | ||||
| non-special stateids are handled separately. (See | ||||
| <xref target="special_stateid" format="default"/> for a discussion of special | ||||
| stateids.) | ||||
| </t> | ||||
| <t> | ||||
| Note that stateids are implicitly qualified by the current client | ||||
| ID, as derived from the client ID associated with the current | ||||
| session. Note, however, that the semantics of the session will | ||||
| prevent stateids associated with a previous client or server | ||||
| instance from being analyzed by this procedure. | ||||
| </t> | ||||
| <t> | ||||
| If server restart has resulted in an invalid | ||||
| client ID or a session ID that is invalid, SEQUENCE will return | ||||
| an error and the operation that takes a stateid as an argument will never | ||||
| be processed. | ||||
| </t> | ||||
| <t> | ||||
| If there has been a server restart where there is a persistent | ||||
| session and all leased state has been lost, then the session | ||||
| in question will, although valid, be marked as dead, and any | ||||
| operation not satisfied by means of the reply cache will | ||||
| receive the error NFS4ERR_DEADSESSION, and thus not be | ||||
| processed as indicated below. | ||||
| </t> | ||||
| <t> | ||||
| When a stateid is being tested and the "other" field is all | ||||
| zeros or all ones, a check that | ||||
| the "other" and "seqid" fields match a defined combination for | ||||
| a special stateid is done and the results determined as follows: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the "other" and "seqid" fields do not match a defined | ||||
| combination associated with a special stateid, the error | ||||
| NFS4ERR_BAD_STATEID is returned. | ||||
| </li> | ||||
| <li> | ||||
| If the special stateid is one designating the current | ||||
| stateid and there is a current stateid, then the current | ||||
| stateid is substituted for the special stateid and the | ||||
| checks appropriate to non-special stateids are performed. | ||||
| </li> | ||||
| <li> | ||||
| If the combination is valid in general but is not | ||||
| appropriate to the context in which the stateid is used | ||||
| (e.g., an all-zero stateid is used when an OPEN stateid | ||||
| is required in a LOCK operation), the error | ||||
| NFS4ERR_BAD_STATEID is also returned. | ||||
| </li> | ||||
| <li> | ||||
| Otherwise, the check is completed and the special stateid | ||||
| is accepted as valid. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When a stateid is being tested, | ||||
| and the "other" field is neither all zeros nor all ones, the | ||||
| following procedure could be used to | ||||
| validate an incoming stateid and return an appropriate error, | ||||
| when necessary, assuming that the "other" field would be divided | ||||
| into a table index and an entry generation. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the table index field is outside the range of the | ||||
| associated table, return NFS4ERR_BAD_STATEID. | ||||
| </li> | ||||
| <li> | ||||
| If the selected table entry is of a different generation than | ||||
| that specified in the incoming stateid, return | ||||
| NFS4ERR_BAD_STATEID. | ||||
| </li> | ||||
| <li> | ||||
| If the selected table entry does not match the current | ||||
| filehandle, return NFS4ERR_BAD_STATEID. | ||||
| </li> | ||||
| <li> | ||||
| If the client ID in the table entry does not match the | ||||
| client ID associated with the current session, | ||||
| return NFS4ERR_BAD_STATEID. | ||||
| </li> | ||||
| <li> | ||||
| If the stateid represents revoked state, then return | ||||
| NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, or | ||||
| NFS4ERR_DELEG_REVOKED, as appropriate. | ||||
| </li> | ||||
| <li> | ||||
| If the stateid type is not valid for the context in which the | ||||
| stateid appears, return NFS4ERR_BAD_STATEID. | ||||
| Note that a stateid may be valid in general, as would be | ||||
| reported by the TEST_STATEID operation, but be invalid for | ||||
| a particular operation, as, for example, when a stateid | ||||
| that doesn't represent byte-range locks is passed to | ||||
| the non-from_open case of LOCK or to LOCKU, or when a stateid | ||||
| that does not represent an open is passed to CLOSE or | ||||
| OPEN_DOWNGRADE. In such cases, the server <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_BAD_STATEID. | ||||
| </li> | ||||
| <li> | ||||
| If the "seqid" field is not zero and it is greater | ||||
| than the current sequence value corresponding to the | ||||
| current "other" field, return NFS4ERR_BAD_STATEID. | ||||
| </li> | ||||
| <li> | ||||
| If the "seqid" field is not zero and it is less | ||||
| than the current sequence value corresponding to the | ||||
| current "other" field, return NFS4ERR_OLD_STATEID. | ||||
| </li> | ||||
| <li> | ||||
| Otherwise, the stateid is valid and the table entry | ||||
| should contain any additional information about the | ||||
| type of stateid and information associated with that | ||||
| particular type of stateid, such as the associated | ||||
| set of locks, e.g., open-owner and | ||||
| lock-owner information, as well as information on the | ||||
| specific locks, e.g., open modes and byte-ranges. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <!-- [auth] "Stateid Lifetime and Validation" --> | ||||
| <section anchor="stateid_use" numbered="true" toc="default"> | ||||
| <name>Stateid Use for I/O Operations</name> | ||||
| <t> | ||||
| Clients performing I/O operations need to select an | ||||
| appropriate stateid based on the | ||||
| locks (including opens and delegations) held by the client and | ||||
| the various types of state-owners sending the I/O requests. | ||||
| SETATTR operations that change the file size are treated | ||||
| like I/O operations in this regard. | ||||
| </t> | ||||
| <t> | ||||
| The following rules, applied in order of decreasing priority, | ||||
| govern the selection of the appropriate stateid. In following | ||||
| these rules, the client will only consider locks of which it | ||||
| has actually received notification by an appropriate operation | ||||
| response or callback. Note that the | ||||
| rules are slightly different in the case of I/O to data servers | ||||
| when file layouts are being | ||||
| used (see <xref target="global_stateid" format="default"/>). | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the client holds a delegation for the file in question, the | ||||
| delegation stateid <bcp14>SHOULD</bcp14> be used. | ||||
| </li> | ||||
| <li> | ||||
| Otherwise, if the entity corresponding to the lock-owner (e.g., a process) | ||||
| sending the I/O has a byte-range lock stateid for the associated open file, | ||||
| then the byte-range lock stateid for that lock-owner and open file <bcp14>SHOULD</bcp14> | ||||
| be used. | ||||
| </li> | ||||
| <li> | ||||
| If there is no byte-range lock stateid, then the OPEN stateid for the open | ||||
| file in question <bcp14>SHOULD</bcp14> be used. | ||||
| </li> | ||||
| <li> | ||||
| Finally, if none of the above apply, then a special stateid | ||||
| <bcp14>SHOULD</bcp14> be used. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Ignoring these rules may result in situations in which the server | ||||
| does not have information necessary to properly process the request. | ||||
| For example, when mandatory byte-range locks are in effect, if the | ||||
| stateid does not indicate the proper lock-owner, via a lock stateid, | ||||
| a request might be avoidably rejected. | ||||
| </t> | ||||
| <t> | ||||
| The server however should not try to enforce these ordering rules | ||||
| and should use whatever information is available to properly process | ||||
| I/O requests. In particular, when a client has a delegation for a given file, it | ||||
| <bcp14>SHOULD</bcp14> take note of this fact in processing a request, even if it is | ||||
| sent with a special stateid. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Stateid Use for I/O Operations" --> | ||||
| <section anchor="stateid_use_sa" numbered="true" toc="default"> | ||||
| <name>Stateid Use for SETATTR Operations</name> | ||||
| <t> | ||||
| Because each operation is associated with a session ID and from that | ||||
| the clientid can be determined, operations do not need to | ||||
| include a stateid for the server to be able to determine whether | ||||
| they should cause a delegation to be recalled or are to be | ||||
| treated as done within the scope of the delegation. | ||||
| </t> | ||||
| <t> | ||||
| In the case of SETATTR operations, a stateid is present. In cases | ||||
| other than those that set the file size, the client may send either | ||||
| a special stateid or, when a delegation is held for the file in | ||||
| question, a delegation stateid. While the server <bcp14>SHOULD</bcp14> validate | ||||
| the stateid and may use the stateid to optimize the determination | ||||
| as to whether a delegation is held, it <bcp14>SHOULD</bcp14> note the presence of | ||||
| a delegation even when a special stateid is sent, and <bcp14>MUST</bcp14> accept a | ||||
| valid delegation stateid when sent. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Stateid Use for SETATTR Operations" --> | ||||
| </section> | ||||
| <!-- [auth] "Stateid Definition" --> | ||||
| <section anchor="lease_renewal" numbered="true" toc="default"> | ||||
| <name>Lease Renewal</name> | ||||
| <t> | ||||
| Each client/server pair, as represented by a client ID, has a single | ||||
| lease. | ||||
| The purpose of the lease is to allow the client to indicate | ||||
| to the server, in a low-overhead way, that it is active, and | ||||
| thus that the server is to retain the client's locks. This arrangement | ||||
| allows the server to remove stale locking-related objects | ||||
| that are held by a client that has crashed or is otherwise | ||||
| unreachable, once the relevant lease expires. This in turn allows | ||||
| other clients to obtain conflicting locks without being | ||||
| delayed indefinitely by inactive or unreachable clients. | ||||
| It is not a | ||||
| mechanism for cache consistency and lease | ||||
| renewals may not be denied if the lease interval has not expired. | ||||
| </t> | ||||
| <t> | ||||
| Since each session is associated with a specific | ||||
| client (identified by the client's client ID), any | ||||
| operation sent on that session is an indication | ||||
| that the associated client is reachable. When a | ||||
| request is sent for a given session, successful | ||||
| execution of a SEQUENCE operation (or successful | ||||
| retrieval of the result of SEQUENCE from the reply | ||||
| cache) on an unexpired lease will result in the | ||||
| lease being implicitly renewed, for the standard | ||||
| renewal period (equal to the lease_time attribute). | ||||
| </t> | ||||
| <t> | ||||
| If the client ID's lease has not expired when the | ||||
| server receives a SEQUENCE operation, then the server | ||||
| <bcp14>MUST</bcp14> renew the lease. If the client ID's lease has expired | ||||
| when the server receives a SEQUENCE operation, the | ||||
| server <bcp14>MAY</bcp14> renew the lease; this depends on whether | ||||
| any state was revoked as a result of the client's | ||||
| failure to renew the lease before expiration. | ||||
| </t> | ||||
| <t> | ||||
| Absent other activity that would renew the lease, a COMPOUND | ||||
| consisting of a single SEQUENCE operation will suffice. The | ||||
| client should also take communication-related delays into | ||||
| account and take steps to ensure that the renewal messages | ||||
| actually reach the server in good time. For example: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| When trunking is in effect, the client should | ||||
| consider sending multiple requests on different | ||||
| connections, in order to ensure that renewal | ||||
| occurs, even in the event of blockage in the | ||||
| path used for one of those connections. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Transport retransmission delays might become | ||||
| so large as to approach or exceed the length | ||||
| of the lease period. This may be particularly | ||||
| likely when the server is unresponsive due to | ||||
| a restart; see <xref target="reclaim_locks" format="default"/>. If the client implementation is not careful, | ||||
| transport retransmission delays can result in the | ||||
| client failing to detect a server restart before | ||||
| the grace period ends. The scenario is that the | ||||
| client is using a transport with exponential | ||||
| backoff, such that the maximum retransmission | ||||
| timeout exceeds both the grace period and the | ||||
| lease_time attribute. A network partition causes | ||||
| the client's connection's retransmission interval | ||||
| to back off, and even after the partition heals, | ||||
| the next transport-level retransmission is sent | ||||
| after the server has restarted and its grace | ||||
| period ends. | ||||
| </t> | ||||
| <t> | ||||
| The client <bcp14>MUST</bcp14> either recover from the ensuing | ||||
| NFS4ERR_NO_GRACE errors or it <bcp14>MUST</bcp14> ensure that, | ||||
| despite transport-level retransmission intervals | ||||
| that exceed the lease_time, a SEQUENCE operation is sent | ||||
| that renews the lease before expiration. The client can achieve this | ||||
| by associating a new connection with the session, | ||||
| and sending a SEQUENCE operation on it. However, if | ||||
| the attempt to establish a new connection is delayed | ||||
| for some reason (e.g., exponential backoff of the connection | ||||
| establishment packets), the client will have to | ||||
| abort the connection establishment attempt before | ||||
| the lease expires, and attempt to reconnect. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If the server renews the lease upon receiving | ||||
| a SEQUENCE operation, the server <bcp14>MUST NOT</bcp14> allow the lease | ||||
| to expire while the rest of the operations | ||||
| in the COMPOUND procedure's request are still | ||||
| executing. Once the last operation has finished, and | ||||
| the response to COMPOUND has been sent, the server | ||||
| <bcp14>MUST</bcp14> set the lease to expire no sooner than the | ||||
| sum of current time and the value of the lease_time attribute. | ||||
| </t> | ||||
| <t> | ||||
| A client ID's lease can expire when it has been | ||||
| at least the lease interval (lease_time) since the | ||||
| last lease-renewing SEQUENCE operation was sent | ||||
| on any of the client ID's sessions and there | ||||
| are no active COMPOUND operations on any such sessions. | ||||
| </t> | ||||
| <t> | ||||
| Because the SEQUENCE operation is the basic mechanism to renew | ||||
| a lease, and because it must be done at least once for each | ||||
| lease period, it is the natural mechanism whereby the server | ||||
| will inform the client of changes in the lease status that the | ||||
| client needs to be informed of. The client should inspect the | ||||
| status flags (sr_status_flags) returned by sequence and take | ||||
| the appropriate action (see | ||||
| <xref target="OP_SEQUENCE_DESCRIPTION" format="default"/> for details). | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The status bits SEQ4_STATUS_CB_PATH_DOWN and | ||||
| SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with | ||||
| the backchannel that the client may need to address | ||||
| in order to receive callback requests. | ||||
| </li> | ||||
| <li> | ||||
| The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and | ||||
| SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicate | ||||
| problems with GSS contexts or RPCSEC_GSS handles | ||||
| for the backchannel that the | ||||
| client might have to address in order to allow callback requests | ||||
| to be sent. | ||||
| </li> | ||||
| <li> | ||||
| The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, | ||||
| SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, | ||||
| SEQ4_STATUS_ADMIN_STATE_REVOKED, and | ||||
| SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the | ||||
| client of lock revocation events. When these bits | ||||
| are set, the client should use TEST_STATEID to find | ||||
| what stateids have been revoked and use FREE_STATEID | ||||
| to acknowledge loss of the associated state. | ||||
| </li> | ||||
| <li> | ||||
| The status bit SEQ4_STATUS_LEASE_MOVE | ||||
| indicates that | ||||
| responsibility for lease renewal has been transferred to | ||||
| one or more new servers. | ||||
| </li> | ||||
| <li> | ||||
| The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED | ||||
| indicates that due to server | ||||
| restart the client must reclaim locking state. | ||||
| </li> | ||||
| <li> | ||||
| The status bit SEQ4_STATUS_BACKCHANNEL_FAULT | ||||
| indicates that the server has encountered an unrecoverable fault | ||||
| with the backchannel (e.g., it has lost track of a | ||||
| sequence ID for a slot in the backchannel). | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <!-- [auth] "Lease Renewal" --> | ||||
| <section anchor="lock_crash_recovery" numbered="true" toc="default"> | ||||
| <name>Crash Recovery</name> | ||||
| <t> | ||||
| A critical requirement in crash recovery is that both the client | ||||
| and the server know when the other has failed. Additionally, it | ||||
| is required that a client sees a consistent view of data across | ||||
| server restarts. All READ and WRITE operations that | ||||
| may have been queued within the client or network buffers must | ||||
| wait until the client has successfully recovered the locks | ||||
| protecting the READ and WRITE operations. Any that reach the | ||||
| server before the server can safely determine that the client | ||||
| has recovered enough locking state to be sure that such | ||||
| operations can be safely processed must be rejected. | ||||
| This will happen because either: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The state presented is no longer valid since it is | ||||
| associated with a now invalid client ID. In this case, the | ||||
| client will receive either an NFS4ERR_BADSESSION or | ||||
| NFS4ERR_DEADSESSION error, and any attempt to attach a new | ||||
| session to that invalid client ID will result in an | ||||
| NFS4ERR_STALE_CLIENTID error. | ||||
| </li> | ||||
| <li> | ||||
| Subsequent recovery of locks may make execution of the | ||||
| operation inappropriate (NFS4ERR_GRACE). | ||||
| </li> | ||||
| </ul> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Client Failure and Recovery</name> | ||||
| <t> | ||||
| In the event that a client fails, the server may release the | ||||
| client's locks when the associated lease has expired. Conflicting | ||||
| locks from another client may only be granted after this lease | ||||
| expiration. As discussed in <xref target="lease_renewal" format="default"/>, when | ||||
| a client has not failed and re-establishes its lease before expiration | ||||
| occurs, requests for conflicting locks will not be granted. | ||||
| </t> | ||||
| <t> | ||||
| To minimize client delay upon restart, lock requests are associated | ||||
| with an instance of the client by a client-supplied verifier. This | ||||
| verifier is part of the client_owner4 sent in the initial | ||||
| EXCHANGE_ID call made by the client. | ||||
| The server returns a client ID as a result of the EXCHANGE_ID | ||||
| operation. The client then confirms the use of the client ID by | ||||
| establishing a session associated with that client ID (see | ||||
| <xref target="OP_CREATE_SESSION_DESCRIPTION" format="default"/> for a | ||||
| description of how this is done). All locks, | ||||
| including opens, byte-range locks, delegations, and layouts obtained | ||||
| by sessions using that client ID, are associated with that client ID. | ||||
| </t> | ||||
| <t> | ||||
| Since the verifier will be changed by the client upon each | ||||
| initialization, the server can compare a new verifier to the verifier | ||||
| associated with currently held locks and determine that they do not | ||||
| match. This signifies the client's new instantiation and subsequent | ||||
| loss (upon confirmation of the new client ID) of locking | ||||
| state. As a result, the server is free to release all | ||||
| locks held that are associated with the old client ID that was | ||||
| derived from the old verifier. At this point, conflicting locks from | ||||
| other clients, kept waiting while the lease had not yet expired, can | ||||
| be granted. In addition, all stateids associated with the old client ID | ||||
| can also be freed, as they are no longer reference-able. | ||||
| </t> | ||||
| <t> | ||||
| Note that the verifier must have the same uniqueness properties as the | ||||
| verifier for the COMMIT operation. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Client Failure and Recovery" --> | ||||
| <section anchor="server_failure" numbered="true" toc="default"> | ||||
| <name>Server Failure and Recovery</name> | ||||
| <t> | ||||
| If the server loses locking state (usually as a result of a restart), it must allow clients time to discover this fact and | ||||
| re-establish the lost locking state. The client must be able to | ||||
| re-establish the locking state without having the server deny valid | ||||
| requests because the server has granted conflicting access to another | ||||
| client. Likewise, if there is a possibility that clients have not | ||||
| yet re-established their locking state for a file and that | ||||
| such locking state might make it invalid to perform READ or | ||||
| WRITE operations. For example, if mandatory locks are a possibility, | ||||
| the server must disallow READ and WRITE operations for that file. | ||||
| </t> | ||||
| <t> | ||||
| A client can determine that loss of locking | ||||
| state has occurred via several methods. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| When a SEQUENCE (most common) or other operation returns | ||||
| NFS4ERR_BADSESSION, this may mean that the session has | ||||
| been destroyed but the client ID is still valid. | ||||
| The client sends a CREATE_SESSION request with the | ||||
| client ID to re-establish the session. If | ||||
| CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID, | ||||
| the client must establish a new client ID (see | ||||
| <xref target="client_id" format="default"/>) and re-establish its | ||||
| lock state with the new client ID, after the CREATE_SESSION | ||||
| operation succeeds (see <xref target="reclaim_locks" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| When a SEQUENCE (most common) or other operation on a | ||||
| persistent session returns NFS4ERR_DEADSESSION, this indicates | ||||
| that a session is no longer usable for new, i.e., not satisfied | ||||
| from the reply cache, operations. Once all pending operations | ||||
| are determined to be either performed before the retry or not | ||||
| performed, the client sends a CREATE_SESSION request with the | ||||
| client ID to re-establish the session. If | ||||
| CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID, | ||||
| the client must establish a new client ID (see | ||||
| <xref target="client_id" format="default"/>) and re-establish its | ||||
| lock state after the CREATE_SESSION, with the | ||||
| new client ID, succeeds | ||||
| (<xref target="reclaim_locks" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| When an operation, neither SEQUENCE nor preceded by SEQUENCE (for | ||||
| example, CREATE_SESSION, DESTROY_SESSION), returns | ||||
| NFS4ERR_STALE_CLIENTID, the client <bcp14>MUST</bcp14> establish | ||||
| a new client ID (<xref target="client_id" format="default"/>) and | ||||
| re-establish its lock state (<xref target="reclaim_locks" format="default"/>). | ||||
| </li> | ||||
| </ol> | ||||
| <section anchor="reclaim_locks" numbered="true" toc="default"> | ||||
| <name>State Reclaim</name> | ||||
| <t> | ||||
| When state information and the associated locks are lost | ||||
| as a result of a server restart, the protocol must provide | ||||
| a way to cause that state to be re-established. The | ||||
| approach used is to define, for most types of locking | ||||
| state (layouts are an exception), a request whose function | ||||
| is to allow the client to | ||||
| re-establish on the server a lock first obtained from a | ||||
| previous instance. Generally, these requests are variants | ||||
| of the requests normally used to create locks of that type | ||||
| and are referred to as "reclaim-type" requests, and the process | ||||
| of re-establishing such locks is referred to as "reclaiming" | ||||
| them. | ||||
| </t> | ||||
| <t anchor="read_write_grace"> | ||||
| Because each client must have an opportunity to reclaim | ||||
| all of the locks that it has without the possibility that | ||||
| some other client will be granted a conflicting lock, | ||||
| a "grace period" is devoted | ||||
| to the reclaim process. During this period, requests | ||||
| creating client IDs and | ||||
| sessions are handled normally, but locking requests are | ||||
| subject to special restrictions. Only | ||||
| reclaim-type locking requests are allowed, unless the | ||||
| server can reliably determine (through state | ||||
| persistently maintained across restart instances) that | ||||
| granting any such lock cannot possibly conflict with a | ||||
| subsequent reclaim. | ||||
| When a request is made to obtain | ||||
| a new lock (i.e., not a reclaim-type request) during the | ||||
| grace period and such a determination cannot be made, | ||||
| the server must return the error NFS4ERR_GRACE. | ||||
| </t> | ||||
| <t> | ||||
| Once a session is established using the new client ID, the | ||||
| client will use reclaim-type locking requests (e.g., LOCK | ||||
| operations with reclaim set to TRUE and OPEN operations with a | ||||
| claim type of CLAIM_PREVIOUS; see | ||||
| <xref target="open_br_reclaim" format="default"/>) to re-establish its locking | ||||
| state. Once this is done, or if there is no such locking | ||||
| state to reclaim, the client sends a global RECLAIM_COMPLETE | ||||
| operation, i.e., one with the rca_one_fs argument set to FALSE, to | ||||
| indicate that it has reclaimed all of the locking state that | ||||
| it will reclaim. Once a client sends such a RECLAIM_COMPLETE | ||||
| operation, it may attempt non-reclaim locking operations, | ||||
| although it might get an NFS4ERR_GRACE status result from each such operation until | ||||
| the period of special handling is over. | ||||
| See <xref target="SEC11-EFF-lock" format="default"/> for a discussion of the | ||||
| analogous handling lock reclamation in the case of file systems | ||||
| transitioning from server to server. | ||||
| </t> | ||||
| <t> | ||||
| During the grace period, the server must reject READ | ||||
| and WRITE operations | ||||
| and non-reclaim locking requests (i.e., other LOCK | ||||
| and OPEN operations) with an error of NFS4ERR_GRACE, | ||||
| unless it can guarantee that these may be done | ||||
| safely, as described below. | ||||
| </t> | ||||
| <t> | ||||
| The grace period may last until all clients that are known to | ||||
| possibly have had locks have done a global RECLAIM_COMPLETE operation, indicating | ||||
| that they have finished reclaiming the locks they held before | ||||
| the server restart. This means that a client that has done a | ||||
| RECLAIM_COMPLETE must be prepared to receive an NFS4ERR_GRACE | ||||
| when attempting to acquire new locks. | ||||
| In order for the server to know that all clients with possible prior | ||||
| lock state have done a RECLAIM_COMPLETE, | ||||
| the server must maintain in stable | ||||
| storage a list clients that may have such locks. The server | ||||
| may also terminate the grace period before all clients have | ||||
| done a global RECLAIM_COMPLETE. The server <bcp14>SHOULD NOT</bcp14> terminate the | ||||
| grace period before a time equal to the lease period in order | ||||
| to give clients an opportunity to find out about the server | ||||
| restart, as a result of sending requests on associated | ||||
| sessions with a frequency governed by the lease time. | ||||
| Note that when a client does not send such requests (or they | ||||
| are sent by the client but not received by the server), | ||||
| it is possible for the grace period to expire before the client | ||||
| finds out that the server restart has occurred. | ||||
| </t> | ||||
| <t> | ||||
| Some additional time in | ||||
| order to allow a client to | ||||
| establish a new client ID and session and to effect lock | ||||
| reclaims may be added to the lease time. Note that | ||||
| analogous rules apply to | ||||
| file system-specific grace periods discussed in | ||||
| <xref target="SEC11-EFF-lock" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| If the server can reliably determine that granting a non-reclaim | ||||
| request will not conflict with reclamation of locks by other | ||||
| clients, the NFS4ERR_GRACE error does not have to be returned | ||||
| even within the grace period, although NFS4ERR_GRACE must always | ||||
| be returned to clients attempting a non-reclaim lock request | ||||
| before doing their own global RECLAIM_COMPLETE. | ||||
| For the server to be able | ||||
| to service READ and WRITE operations during the grace period, it must | ||||
| again be able to guarantee that no possible conflict could arise | ||||
| between a potential reclaim locking request and the READ or WRITE | ||||
| operation. If the server is unable to offer that guarantee, the | ||||
| NFS4ERR_GRACE error must be returned to the client. | ||||
| </t> | ||||
| <t> | ||||
| For a server to provide simple, valid handling during the grace | ||||
| period, the easiest method is to simply reject all non-reclaim locking | ||||
| requests and READ and WRITE operations by returning the NFS4ERR_GRACE | ||||
| error. However, a server may keep information about granted locks in | ||||
| stable storage. With this information, the server could determine if | ||||
| a locking, READ or WRITE operation can be safely processed. | ||||
| </t> | ||||
| <t> | ||||
| For example, if the server maintained on stable storage summary | ||||
| information on whether mandatory locks exist, either mandatory | ||||
| byte-range locks, or share reservations specifying deny modes, | ||||
| many requests could be allowed during the grace period. If it | ||||
| is known that no such share reservations exist, OPEN request that | ||||
| do not specify deny modes may be safely granted. If, in addition, | ||||
| it is known that no mandatory byte-range locks exist, either | ||||
| through information stored on stable storage or simply because | ||||
| the server does not support such locks, READ and WRITE operations | ||||
| may be safely processed during the grace period. | ||||
| Another important case is where it is known that no mandatory | ||||
| byte-range locks exist, either because the server does not | ||||
| provide support for them or because their absence is known | ||||
| from persistently recorded data. In this case, READ and | ||||
| WRITE operations specifying stateids derived from reclaim-type | ||||
| operations may be validly processed during the grace period | ||||
| because of the fact that the valid reclaim ensures that no lock | ||||
| subsequently granted can prevent the I/O. | ||||
| </t> | ||||
| <t> | ||||
| To reiterate, for a server that allows non-reclaim lock and I/O | ||||
| requests to be processed during the grace period, it <bcp14>MUST</bcp14> determine | ||||
| that no lock subsequently reclaimed will be rejected and that no lock | ||||
| subsequently reclaimed would have prevented any I/O operation | ||||
| processed during the grace period. | ||||
| </t> | ||||
| <t> | ||||
| Clients should be prepared for the return of NFS4ERR_GRACE errors for | ||||
| non-reclaim lock and I/O requests. In this case, the client should | ||||
| employ a retry mechanism for the request. A delay (on the order of | ||||
| several seconds) between retries should be used to avoid overwhelming | ||||
| the server. Further discussion of the general issue is included in | ||||
| <xref target="Floyd" format="default"/>. The client must account for the server that | ||||
| can perform I/O and non-reclaim locking requests within the grace period | ||||
| as well as those that cannot do so. | ||||
| </t> | ||||
| <t> | ||||
| A reclaim-type locking request outside the server's grace period | ||||
| can only succeed if the server can guarantee that no conflicting | ||||
| lock or I/O request has been granted since restart. | ||||
| </t> | ||||
| <t> | ||||
| A server may, upon restart, establish a new value for the lease | ||||
| period. Therefore, clients should, once a new client ID is | ||||
| established, refetch the lease_time attribute and use it as the basis | ||||
| for lease renewal for the lease associated with that server. However, | ||||
| the server must establish, for this restart event, a grace period at | ||||
| least as long as the lease period for the previous server | ||||
| instantiation. This allows the client state obtained during the | ||||
| previous server instance to be reliably re-established. | ||||
| </t> | ||||
| <t> | ||||
| The possibility exists that, because of server configuration | ||||
| events, the client will be communicating with a server | ||||
| different than the one on which the locks were obtained, as | ||||
| shown by the combination of eir_server_scope and | ||||
| eir_server_owner. This leads to the issue of if and when | ||||
| the client should attempt to reclaim locks previously obtained | ||||
| on what is being reported as a different server. The rules | ||||
| to resolve this question are as follows: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the server scope is different, the client should not | ||||
| attempt to reclaim locks. In this situation, no lock | ||||
| reclaim is possible. Any attempt to re-obtain the locks | ||||
| with non-reclaim operations is problematic since there is | ||||
| no guarantee that the existing filehandles will be recognized | ||||
| by the new server, or that if recognized, they denote the | ||||
| same objects. It is best to treat the locks as having been | ||||
| revoked by the reconfiguration event. | ||||
| </li> | ||||
| <li> | ||||
| If the server scope is the same, the client should attempt | ||||
| to reclaim locks, even if the eir_server_owner value is | ||||
| different. In this situation, it is the responsibility | ||||
| of the server to return NFS4ERR_NO_GRACE if it cannot | ||||
| provide correct support for lock reclaim operations, | ||||
| including the prevention of edge conditions. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The eir_server_owner field is not used in making this | ||||
| determination. Its function is to specify trunking | ||||
| possibilities for the client (see <xref target="Trunking" format="default"/>) | ||||
| and not to control lock reclaim. | ||||
| </t> | ||||
| <section anchor="reclaim_security_considerations" numbered="true" toc="default"> | ||||
| <name>Security Considerations for State Reclaim</name> | ||||
| <t> | ||||
| During the grace period, a client can reclaim state that it believes or | ||||
| asserts it had before the server restarted. Unless the server | ||||
| maintained a complete record of all the state the client had, | ||||
| the server has little choice but to trust the client. (Of course, | ||||
| if the server maintained a complete record, then it would not | ||||
| have to force the client to reclaim state after server restart.) | ||||
| While the server has to trust the client to tell the truth, the | ||||
| negative consequences for security are limited to enabling | ||||
| denial-of-service attacks in situations in which AUTH_SYS is | ||||
| supported. The | ||||
| fundamental rule for the server when processing reclaim requests | ||||
| is that it <bcp14>MUST NOT</bcp14> grant the reclaim if an equivalent non-reclaim | ||||
| request would not be granted during steady state due to access | ||||
| control or access conflict issues. For example, an OPEN request | ||||
| during a reclaim will be refused with NFS4ERR_ACCESS if the principal making | ||||
| the request does not have access to open the file according to the | ||||
| discretionary ACL (<xref target="attrdef_dacl" format="default"/>) on the file. | ||||
| </t> | ||||
| <t> | ||||
| Nonetheless, it is possible that a client operating in error or | ||||
| maliciously could, during reclaim, prevent another client from | ||||
| reclaiming access to state. For example, an attacker could | ||||
| send an OPEN reclaim operation with a deny mode that prevents | ||||
| another client from reclaiming the OPEN state it had before the | ||||
| server restarted. | ||||
| The attacker could perform the same denial of service during | ||||
| steady state prior to server restart, as long as the | ||||
| attacker had permissions. Given that the attack | ||||
| vectors are equivalent, the grace period does not offer any | ||||
| additional opportunity for denial of service, and any concerns | ||||
| about this attack vector, whether during grace or steady state, | ||||
| are addressed the same way: use RPCSEC_GSS for authentication | ||||
| and limit access to the file only to principals that the owner of | ||||
| the file trusts. | ||||
| </t> | ||||
| <t> | ||||
| Note that if prior to restart the server had client | ||||
| IDs with the EXCHGID4_FLAG_BIND_PRINC_STATEID (<xref target="OP_EXCHANGE_ID" format="default"/>) capability set, then the server | ||||
| <bcp14>SHOULD</bcp14> record in stable storage the client owner and the | ||||
| principal that established the client ID via EXCHANGE_ID. | ||||
| If the server does not, then there is a risk a client will | ||||
| be unable to reclaim state if it does not have a credential | ||||
| for a principal that was originally authorized to | ||||
| establish the state. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Security Considerations for State Reclaim" --> | ||||
| </section> | ||||
| <!-- [auth] "State Reclaim" --> | ||||
| </section> | ||||
| <!-- [auth] "Server Failure and Recovery" --> | ||||
| <section anchor="network_partitions_and_recovery" numbered="true" toc="default"> | ||||
| <name>Network Partitions and Recovery</name> | ||||
| <t> | ||||
| If the duration of a network partition is greater than the lease | ||||
| period provided by the server, the server will not have received a | ||||
| lease renewal from the client. If this occurs, the server may free | ||||
| all locks held for the client or it may allow the lock state to | ||||
| remain for a considerable period, subject to the constraint that | ||||
| if a request for a conflicting lock is made, locks associated with | ||||
| an expired lease do not prevent such a conflicting lock from being | ||||
| granted but <bcp14>MUST</bcp14> be revoked as necessary so as to avoid interfering with | ||||
| such conflicting requests. | ||||
| </t> | ||||
| <t> | ||||
| If the server chooses to delay freeing of lock state until there | ||||
| is a conflict, it may either free all of the client's locks once | ||||
| there is a conflict or it may only revoke the minimum set of locks | ||||
| necessary to allow conflicting requests. When it adopts the | ||||
| finer-grained approach, it must revoke all locks associated with a | ||||
| given stateid, even if the conflict is with only a subset of locks. | ||||
| </t> | ||||
| <t> | ||||
| When the server chooses to free all of a client's lock state, either | ||||
| immediately upon lease expiration or as a result of the first | ||||
| attempt to obtain a conflicting a lock, the server may report the | ||||
| loss of lock state in a number of ways. | ||||
| </t> | ||||
| <t> | ||||
| The server may choose to invalidate the session and the associated | ||||
| client ID. In this case, once the client can communicate | ||||
| with the server, it will receive an NFS4ERR_BADSESSION error. Upon | ||||
| attempting to create a new session, it would get an | ||||
| NFS4ERR_STALE_CLIENTID. Upon creating the new client ID and new | ||||
| session, the client will attempt to reclaim locks. Normally, the | ||||
| server will not allow the client to reclaim locks, because the | ||||
| server will not be in its recovery grace period. | ||||
| </t> | ||||
| <t> | ||||
| Another possibility is for the server to maintain the session and | ||||
| client ID but for all stateids held by the | ||||
| client to become invalid or stale. Once the client can reach | ||||
| the server after such a network partition, the status returned by | ||||
| the SEQUENCE operation will indicate a loss of locking state; i.e., | ||||
| the flag SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in | ||||
| sr_status_flags. In | ||||
| addition, all I/O submitted by the | ||||
| client with the now invalid stateids will fail with the server | ||||
| returning the error NFS4ERR_EXPIRED. Once the client learns of | ||||
| the loss of locking state, it | ||||
| will suitably notify the applications that held the invalidated | ||||
| locks. The client should then take action to free invalidated | ||||
| stateids, either by establishing a new client ID using a new | ||||
| verifier or by doing a FREE_STATEID operation to release each | ||||
| of the invalidated stateids. | ||||
| </t> | ||||
| <t> | ||||
| When the server adopts a finer-grained approach to revocation | ||||
| of locks when a client's lease has expired, only a subset of stateids | ||||
| will normally become invalid during a network partition. | ||||
| When the client can communicate with the server after such a | ||||
| network partition heals, the status returned by the SEQUENCE | ||||
| operation will indicate a partial loss of locking state | ||||
| (SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). | ||||
| In addition, operations, including I/O submitted by the | ||||
| client, with the now invalid stateids will fail with the server | ||||
| returning the error NFS4ERR_EXPIRED. Once the client learns of | ||||
| the loss of locking state, it will use the TEST_STATEID operation | ||||
| on all of its stateids to | ||||
| determine which locks have been lost and then | ||||
| suitably notify the applications that held the invalidated | ||||
| locks. The client can then release the invalidated locking | ||||
| state and acknowledge the revocation of the associated locks | ||||
| by doing a FREE_STATEID operation on each of the invalidated | ||||
| stateids. | ||||
| </t> | ||||
| <t> | ||||
| When a network partition is combined with a server restart, there are | ||||
| edge conditions that place requirements on the server in order to | ||||
| avoid silent data corruption following the server restart. Two of these | ||||
| edge conditions are known, and are discussed below. | ||||
| </t> | ||||
| <t> | ||||
| The first edge condition arises as a result of the scenarios such as | ||||
| the following: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| Client A acquires a lock. | ||||
| </li> | ||||
| <li> | ||||
| Client A and server experience mutual network partition, such that | ||||
| client A is unable to renew its lease. | ||||
| </li> | ||||
| <li> | ||||
| Client A's lease expires, and the server releases the lock. | ||||
| </li> | ||||
| <li> | ||||
| Client B acquires a lock that would have conflicted | ||||
| with that of client A. | ||||
| </li> | ||||
| <li> | ||||
| Client B releases its lock. | ||||
| </li> | ||||
| <li> | ||||
| Server restarts. | ||||
| </li> | ||||
| <li> | ||||
| Network partition between client A and server heals. | ||||
| </li> | ||||
| <li> | ||||
| Client A connects to a new server instance and finds out about | ||||
| server restart. | ||||
| </li> | ||||
| <li> | ||||
| Client A reclaims its lock within the server's grace period. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| Thus, at the final step, the server has erroneously granted client A's | ||||
| lock reclaim. If client B modified the object the lock was protecting, | ||||
| client A will experience object corruption. | ||||
| </t> | ||||
| <t> | ||||
| The second known edge condition arises in situations such as the following: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| Client A acquires one or more locks. | ||||
| </li> | ||||
| <li> | ||||
| Server restarts. | ||||
| </li> | ||||
| <li> | ||||
| Client A and server experience mutual network | ||||
| partition, such that client A is unable to reclaim | ||||
| all of its locks within the grace period. | ||||
| </li> | ||||
| <li> | ||||
| Server's reclaim grace period ends. Client A has either | ||||
| no locks or an incomplete set of locks known to the server. | ||||
| </li> | ||||
| <li> | ||||
| Client B acquires a lock that would have conflicted | ||||
| with a lock of client A that was not reclaimed. | ||||
| </li> | ||||
| <li> | ||||
| Client B releases the lock. | ||||
| </li> | ||||
| <li> | ||||
| Server restarts a second time. | ||||
| </li> | ||||
| <li> | ||||
| Network partition between client A and server heals. | ||||
| </li> | ||||
| <li> | ||||
| Client A connects to new server instance and finds out about | ||||
| server restart. | ||||
| </li> | ||||
| <li> | ||||
| Client A reclaims its lock within the server's | ||||
| grace period. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| As with the first edge condition, the final step of the scenario of | ||||
| the second edge condition has the server erroneously granting client | ||||
| A's lock reclaim. | ||||
| </t> | ||||
| <t> | ||||
| Solving the first and second edge conditions requires either that the server | ||||
| always assumes after it restarts that some edge condition | ||||
| occurs, and thus returns NFS4ERR_NO_GRACE for all reclaim attempts, or that the server | ||||
| record some information in stable storage. The amount | ||||
| of information the | ||||
| server records in stable storage is in inverse proportion to how harsh | ||||
| the server intends to be whenever edge conditions arise. | ||||
| The server | ||||
| that is completely tolerant of all edge conditions will record in | ||||
| stable storage every lock that is acquired, removing the lock record | ||||
| from stable storage only when the lock is released. | ||||
| For the two edge conditions discussed above, the harshest a | ||||
| server can be, and still support a grace period for reclaims, requires | ||||
| that the server record in stable storage some minimal | ||||
| information. For example, a server implementation could, for each | ||||
| client, save in stable storage a record containing: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| the co_ownerid field from the client_owner4 presented in the | ||||
| EXCHANGE_ID operation. | ||||
| </li> | ||||
| <li> | ||||
| a boolean that indicates if the client's lease expired | ||||
| or if there was administrative intervention (see | ||||
| <xref target="server_revocation" format="default"/>) to revoke | ||||
| a byte-range lock, share reservation, or delegation and | ||||
| there has been no acknowledgment, via FREE_STATEID, | ||||
| of such revocation. | ||||
| </li> | ||||
| <li> | ||||
| a boolean that indicates whether the client may have locks | ||||
| that it believes to be reclaimable in situations in which the | ||||
| grace period was terminated, making the server's view of | ||||
| lock reclaimability suspect. The server will set this for | ||||
| any client record in stable storage where the client has | ||||
| not done a suitable RECLAIM_COMPLETE (global or file | ||||
| system-specific depending on the target of the lock | ||||
| request) before it grants any new (i.e., not reclaimed) | ||||
| lock to any client. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Assuming the above record keeping, for the first edge condition, after | ||||
| the server restarts, the record that client A's lease expired means | ||||
| that another client could have acquired a conflicting byte-range lock, | ||||
| share reservation, or delegation. Hence, the server must reject a | ||||
| reclaim from client A with the error NFS4ERR_NO_GRACE. | ||||
| </t> | ||||
| <t> | ||||
| For the second edge condition, after the server restarts for a second | ||||
| time, the indication that the client had not completed its | ||||
| reclaims at the time at which the grace period ended | ||||
| means that the server must reject a reclaim from client A | ||||
| with the error NFS4ERR_NO_GRACE. | ||||
| </t> | ||||
| <t> | ||||
| When either edge condition occurs, the client's attempt to reclaim | ||||
| locks will result in the error NFS4ERR_NO_GRACE. When this is | ||||
| received, or after the client restarts with no lock state, the | ||||
| client will send a global RECLAIM_COMPLETE. When | ||||
| the RECLAIM_COMPLETE is received, the server and client are | ||||
| again in agreement regarding reclaimable locks and both booleans in persistent | ||||
| storage can be reset, to be set again only when there is a subsequent | ||||
| event that causes lock reclaim operations to be questionable. | ||||
| </t> | ||||
| <t> | ||||
| Regardless of the level and approach to record keeping, the server | ||||
| <bcp14>MUST</bcp14> implement one of the following strategies (which apply to | ||||
| reclaims of share reservations, byte-range locks, and delegations): | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| Reject all reclaims with NFS4ERR_NO_GRACE. This | ||||
| is extremely unforgiving, but necessary if the server does not | ||||
| record lock state in stable storage. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Record sufficient state in stable storage such that | ||||
| all known edge conditions involving server restart, | ||||
| including the two noted in this section, are | ||||
| detected. It is acceptable to erroneously recognize an edge condition | ||||
| and not allow a reclaim, when, with sufficient knowledge, it | ||||
| would be allowed. The error the server would return in this | ||||
| case is NFS4ERR_NO_GRACE. Note that it is not known if there are other | ||||
| edge conditions. | ||||
| </t> | ||||
| <t> | ||||
| In the event that, after a server restart, the server | ||||
| determines there is unrecoverable damage or | ||||
| corruption to the information in stable storage, then for | ||||
| all clients and/or locks that may be affected, the server <bcp14>MUST</bcp14> | ||||
| return NFS4ERR_NO_GRACE. | ||||
| </t> | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| A mandate for the client's handling of the NFS4ERR_NO_GRACE error is | ||||
| outside the scope of this specification, since the strategies for such | ||||
| handling are very dependent on the client's operating environment. | ||||
| However, one potential approach is described below. | ||||
| </t> | ||||
| <t> | ||||
| When the client receives NFS4ERR_NO_GRACE, it could examine the change | ||||
| attribute of the objects for which the client is trying to reclaim state, | ||||
| and use that to determine whether to re-establish the state via normal | ||||
| OPEN or LOCK operations. This is acceptable provided that the client's | ||||
| operating environment allows it. In other words, the client | ||||
| implementor is advised to document for his users the behavior. The | ||||
| client could also inform the application that its byte-range lock or share | ||||
| reservations (whether or not they were delegated) have been lost, such | ||||
| as via a UNIX signal, a Graphical User Interface (GUI) pop-up window, etc. | ||||
| See <xref target="data_caching_revocation" format="default"/> | ||||
| for a discussion of what the client should do | ||||
| for dealing with unreclaimed delegations on client state. | ||||
| </t> | ||||
| <t> | ||||
| For further discussion of revocation of locks, see | ||||
| <xref target="server_revocation" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Network Partitions and Recovery" --> | ||||
| </section> | ||||
| <!-- [auth] "Crash Recovery" --> | ||||
| <section anchor="server_revocation" numbered="true" toc="default"> | ||||
| <name>Server Revocation of Locks</name> | ||||
| <t> | ||||
| At any point, the server can revoke locks held by a client, and the | ||||
| client must be prepared for this event. When the client detects that | ||||
| its locks have been or may have been revoked, the client is | ||||
| responsible for validating the state information between itself and | ||||
| the server. Validating locking state for the client means that it | ||||
| must verify or reclaim state for each lock currently held. | ||||
| </t> | ||||
| <t> | ||||
| The first occasion of lock revocation is upon server | ||||
| restart. Note that this includes situations | ||||
| in which sessions are persistent and locking state is | ||||
| lost. In this class of instances, the client will | ||||
| receive an error (NFS4ERR_STALE_CLIENTID) on an | ||||
| operation that takes client ID, usually as part of | ||||
| recovery in response to a problem with the current | ||||
| session), and the client will proceed | ||||
| with normal crash recovery as described in the <xref target="reclaim_locks" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| The second occasion of lock revocation is the inability to renew the lease | ||||
| before expiration, as discussed in | ||||
| <xref target="network_partitions_and_recovery" format="default"/>. While this is | ||||
| considered a rare or unusual event, | ||||
| the client must be prepared to recover. The server is responsible | ||||
| for determining the precise consequences of the lease expiration, | ||||
| informing the client of the scope of the lock revocation decided | ||||
| upon. The client then uses the status information provided | ||||
| by the server in the SEQUENCE results (field sr_status_flags, | ||||
| see <xref target="OP_SEQUENCE_DESCRIPTION" format="default"/>) | ||||
| to synchronize its locking state with that of the | ||||
| server, in order to recover. | ||||
| </t> | ||||
| <t> | ||||
| The third occasion of lock revocation can occur as a result of | ||||
| revocation of locks within the lease period, either because of | ||||
| administrative intervention or because a recallable lock (a | ||||
| delegation or layout) was not returned within the lease period | ||||
| after having been recalled. While these are | ||||
| considered rare events, they are possible, and the client must be | ||||
| prepared to deal with them. When either of these events occurs, | ||||
| the client finds out about the situation through the status returned | ||||
| by the SEQUENCE operation. Any use of stateids associated with | ||||
| locks revoked during the lease period will receive the error | ||||
| NFS4ERR_ADMIN_REVOKED or NFS4ERR_DELEG_REVOKED, as appropriate. | ||||
| </t> | ||||
| <t> | ||||
| In all situations in which a subset of locking state may have been | ||||
| revoked, which include all cases in which locking state is revoked | ||||
| within the lease period, it is up to the client to determine which | ||||
| locks have been revoked and which have not. It does this by | ||||
| using the TEST_STATEID operation on the appropriate set of stateids. | ||||
| Once the set of revoked locks has been determined, the applications | ||||
| can be notified, and the invalidated stateids can be freed and | ||||
| lock revocation acknowledged by using FREE_STATEID. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Server Revocation of Locks" --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Short and Long Leases</name> | ||||
| <t> | ||||
| When determining the time period for the server lease, the usual lease | ||||
| trade-offs apply. A short lease is good for fast server recovery at a | ||||
| cost of increased operations to effect lease renewal (when there are | ||||
| no other operations during the period to effect lease renewal as a | ||||
| side effect). A long lease is certainly kinder and gentler to | ||||
| servers trying to handle very large numbers of clients. The number of extra requests | ||||
| to effect lock renewal drops in inverse | ||||
| proportion to the lease time. The disadvantages of a long lease | ||||
| include the possibility of slower recovery after certain failures. | ||||
| After server failure, a longer grace period may be required when | ||||
| some clients do not promptly reclaim their locks and do a | ||||
| global RECLAIM_COMPLETE. In the event of client failure, | ||||
| the longer period for a lease to expire will force conflicting | ||||
| requests to wait longer. | ||||
| </t> | ||||
| <t> | ||||
| A long lease is practical if the server can store lease state in | ||||
| stable storage. Upon recovery, the server can reconstruct the | ||||
| lease state from its stable storage and continue operation with | ||||
| its clients. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Short and Long Leases" --> | ||||
| <section anchor="lease_propagation_delay" numbered="true" toc="default"> | ||||
| <name>Clocks, Propagation Delay, and Calculating Lease Expiration</name> | ||||
| <t> | ||||
| To avoid the need for synchronized clocks, lease times are granted by | ||||
| the server as a time delta. However, there is a requirement that the | ||||
| client and server clocks do not drift excessively over the duration of | ||||
| the lease. There is also the issue of propagation delay across the | ||||
| network, which could easily be several hundred milliseconds, as well as | ||||
| the possibility that requests will be lost and need to be | ||||
| retransmitted. | ||||
| </t> | ||||
| <t> | ||||
| To take propagation delay into account, the client should | ||||
| subtract it from lease times (e.g., if the client estimates the | ||||
| one-way propagation delay as 200 milliseconds, then it can | ||||
| assume that the lease is already 200 milliseconds old when it | ||||
| gets it). In addition, it will take another 200 milliseconds to | ||||
| get a response back to the server. So the client must send a | ||||
| lease renewal or write data back to the server at least 400 | ||||
| milliseconds before the lease would expire. If the propagation delay | ||||
| varies over the life of the lease (e.g., the client is on a mobile | ||||
| host), the client will need to continuously subtract the increase | ||||
| in propagation delay from the lease times. | ||||
| </t> | ||||
| <t> | ||||
| The server's lease period configuration should take into account the | ||||
| network distance of the clients that will be accessing the server's | ||||
| resources. It is expected that the lease period will take into | ||||
| account the network propagation delays and other network delay factors | ||||
| for the client population. Since the protocol does not allow for an | ||||
| automatic method to determine an appropriate lease period, the | ||||
| server's administrator may have to tune the lease period. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Clocks, Propagation Delay, and Calculating Lease Expiration" --> | ||||
| <section anchor="vestigial_locking" numbered="true" toc="default"> | ||||
| <name>Obsolete Locking Infrastructure from NFSv4.0</name> | ||||
| <t> | ||||
| There are a number of operations and fields within existing | ||||
| operations that no longer have a function in NFSv4.1. | ||||
| In one way or another, these changes are all due to | ||||
| the implementation of sessions that provide client context | ||||
| and exactly once semantics as a base feature of the protocol, | ||||
| separate from locking itself. | ||||
| </t> | ||||
| <t> | ||||
| The following NFSv4.0 operations <bcp14>MUST NOT</bcp14> be implemented in NFSv4.1. | ||||
| The server <bcp14>MUST</bcp14> return NFS4ERR_NOTSUPP if these operations are | ||||
| found in an NFSv4.1 COMPOUND. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| SETCLIENTID since its function has been replaced by | ||||
| EXCHANGE_ID. | ||||
| </li> | ||||
| <li> | ||||
| SETCLIENTID_CONFIRM since client ID confirmation now | ||||
| happens by means of CREATE_SESSION. | ||||
| </li> | ||||
| <li> | ||||
| OPEN_CONFIRM because state-owner-based seqids | ||||
| have been replaced by the sequence ID in the | ||||
| SEQUENCE operation. | ||||
| </li> | ||||
| <li> | ||||
| RELEASE_LOCKOWNER because lock-owners with no associated | ||||
| locks do not have any sequence-related state and so can | ||||
| be deleted by the server at will. | ||||
| </li> | ||||
| <li> | ||||
| RENEW because every SEQUENCE operation for a session causes | ||||
| lease renewal, making a separate operation superfluous. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Also, there are a number of fields, present in existing operations, | ||||
| related to locking that have no use in minor version 1. They | ||||
| were used in minor version 0 to perform functions now provided | ||||
| in a different | ||||
| fashion. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Sequence ids used to sequence requests for a given state-owner | ||||
| and to provide retry protection, now provided | ||||
| via sessions. | ||||
| </li> | ||||
| <li> | ||||
| Client IDs used to identify the client associated with a given | ||||
| request. Client identification is now available using the client ID | ||||
| associated with the current session, without needing an explicit | ||||
| client ID field. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Such vestigial fields in existing operations have no function in | ||||
| NFSv4.1 and are ignored by the server. Note that client IDs in | ||||
| operations new to NFSv4.1 (such as CREATE_SESSION and DESTROY_CLIENTID) | ||||
| are not ignored. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Vestigial Locking Infrastructure From V4.0" --> | ||||
| </section> | ||||
| <!-- [auth] "State Management" --> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="file_locking" numbered="true" toc="default"> | ||||
| <name>File Locking and Share Reservations</name> | ||||
| <t> | ||||
| To support Win32 share reservations, it is necessary to provide | ||||
| operations that atomically open or create files. Having a | ||||
| separate share/unshare operation would not allow correct | ||||
| implementation of the Win32 OpenFile API. In order to | ||||
| correctly implement share semantics, the previous NFS protocol | ||||
| mechanisms used when a file is opened or created (LOOKUP, CREATE, | ||||
| ACCESS) need to be replaced. The NFSv4.1 protocol defines | ||||
| an OPEN operation that is capable of atomically looking up, creating, | ||||
| and locking a file on the server. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Opens and Byte-Range Locks</name> | ||||
| <t> | ||||
| It is assumed that manipulating a byte-range lock is rare when | ||||
| compared to READ | ||||
| and WRITE operations. It is also assumed that server restarts and network | ||||
| partitions are relatively rare. Therefore, it is important that the | ||||
| READ and WRITE operations have a lightweight mechanism to indicate if | ||||
| they possess a held lock. A LOCK operation contains the | ||||
| heavyweight information required to establish a byte-range lock and uniquely | ||||
| define the owner of the lock. | ||||
| </t> | ||||
| <section anchor="state-owner" numbered="true" toc="default"> | ||||
| <name>State-Owner Definition</name> | ||||
| <t> | ||||
| When opening a file or requesting a byte-range lock, the | ||||
| client must specify an identifier that represents the owner of | ||||
| the requested lock. This identifier is in the form of a | ||||
| state-owner, represented in the protocol by a state_owner4, a | ||||
| variable-length opaque array that, when concatenated with the | ||||
| current client ID, uniquely defines the owner of a lock managed | ||||
| by the client. This may be a thread ID, process ID, or other | ||||
| unique value. | ||||
| </t> | ||||
| <t> | ||||
| Owners of opens and owners of byte-range locks are separate | ||||
| entities and remain separate even if the same opaque arrays | ||||
| are used to designate owners of each. The protocol distinguishes | ||||
| between open-owners (represented by open_owner4 structures) | ||||
| and lock-owners (represented by lock_owner4 structures). | ||||
| </t> | ||||
| <t> | ||||
| Each open is associated with a specific open-owner while each | ||||
| byte-range lock is associated with a lock-owner and an | ||||
| open-owner, the latter being the open-owner associated with the | ||||
| open file under which the LOCK operation was done. Delegations | ||||
| and layouts, on the other hand, are not associated with a | ||||
| specific owner but are associated with the client as a whole | ||||
| (identified by a client ID). | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "State-owner Definition" --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Use of the Stateid and Locking</name> | ||||
| <t> | ||||
| All READ, WRITE, and SETATTR operations contain a stateid. For the | ||||
| purposes of this section, SETATTR operations that change the size | ||||
| attribute of a file are treated as if they are writing the area | ||||
| between the old and new sizes (i.e., the byte-range truncated or added to the | ||||
| file by means of the SETATTR), even where SETATTR is not explicitly | ||||
| mentioned in the text. The stateid passed to one of these operations must | ||||
| be one that represents an open, a set of byte-range locks, or a | ||||
| delegation, or it may be a special stateid representing anonymous | ||||
| access or the special bypass stateid. | ||||
| </t> | ||||
| <t> | ||||
| If the state-owner performs a READ or WRITE operation in a situation in which | ||||
| it has established a byte-range lock or share reservation | ||||
| on the server (any OPEN constitutes a share reservation), the | ||||
| stateid (previously returned by the server) must be used to | ||||
| indicate what locks, including both byte-range | ||||
| locks and share reservations, are held by the state-owner. If no state | ||||
| is established by the client, either a byte-range lock or a share reservation, | ||||
| a special stateid for anonymous state (zero as the value for "other" and "seqid") | ||||
| is used. (See <xref target="special_stateid" format="default"/> for a description of | ||||
| 'special' stateids in general.) | ||||
| Regardless of whether a stateid for anonymous state | ||||
| or a stateid returned by the server is used, if there is a | ||||
| conflicting share reservation or mandatory byte-range lock held on the | ||||
| file, the server <bcp14>MUST</bcp14> refuse to service the READ or WRITE operation. | ||||
| </t> | ||||
| <t> | ||||
| Share reservations are established by OPEN operations and by their | ||||
| nature are mandatory in that when the OPEN denies READ or WRITE | ||||
| operations, that denial results in such operations being rejected with | ||||
| error NFS4ERR_LOCKED. Byte-range locks may be implemented by the server | ||||
| as either mandatory or advisory, or the choice of mandatory or | ||||
| advisory behavior may be determined by the server on the basis of the | ||||
| file being accessed (for example, some UNIX-based servers support a | ||||
| "mandatory lock bit" on the mode attribute such that if set, byte-range | ||||
| locks are required on the file before I/O is possible). When byte-range | ||||
| locks are advisory, they only prevent the granting of conflicting lock | ||||
| requests and have no effect on READs or WRITEs. Mandatory byte-range | ||||
| locks, however, prevent conflicting I/O operations. When they are | ||||
| attempted, they are rejected with NFS4ERR_LOCKED. When the client | ||||
| gets NFS4ERR_LOCKED on a file for which it knows it has the proper share | ||||
| reservation, it will need to send a LOCK operation on the byte-range of | ||||
| the file that includes the byte-range the I/O was to be performed on, with | ||||
| an appropriate locktype field of the LOCK operation's arguments (i.e., READ*_LT for a READ operation, WRITE*_LT | ||||
| for a WRITE operation). | ||||
| </t> | ||||
| <t> | ||||
| Note that for UNIX environments that support mandatory byte-range locking, | ||||
| the distinction between advisory and mandatory locking is subtle. In | ||||
| fact, advisory and mandatory byte-range locks are exactly the same as | ||||
| far as the APIs and requirements on implementation. If the mandatory | ||||
| lock attribute is set on the file, the server checks to see if the | ||||
| lock-owner has an appropriate shared (READ_LT) or exclusive (WRITE_LT) byte-range | ||||
| lock on the byte-range it wishes to READ from or WRITE to. If there is no | ||||
| appropriate lock, the server checks if there is a conflicting lock | ||||
| (which can be done by attempting to acquire the conflicting lock on | ||||
| behalf of the lock-owner, and if successful, release the lock after | ||||
| the READ or WRITE operation is done), and if there is, the server returns | ||||
| NFS4ERR_LOCKED. | ||||
| </t> | ||||
| <t> | ||||
| For Windows environments, byte-range locks are always mandatory, so the | ||||
| server always checks for byte-range locks during I/O requests. | ||||
| </t> | ||||
| <t> | ||||
| Thus, the LOCK operation does not need to distinguish | ||||
| between advisory and mandatory byte-range locks. It is the | ||||
| server's processing of the READ and WRITE operations that introduces | ||||
| the distinction. | ||||
| </t> | ||||
| <t> | ||||
| Every stateid that is validly passed to READ, WRITE, or SETATTR, | ||||
| with the exception of special stateid values, | ||||
| defines an access mode for the file (i.e., | ||||
| OPEN4_SHARE_ACCESS_READ, OPEN4_SHARE_ACCESS_WRITE, or | ||||
| OPEN4_SHARE_ACCESS_BOTH). | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| For stateids associated with opens, this is the mode defined by | ||||
| the original OPEN that caused the | ||||
| allocation of the OPEN stateid | ||||
| and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the | ||||
| same open-owner/file pair. | ||||
| </li> | ||||
| <li> | ||||
| For stateids returned by byte-range LOCK operations, | ||||
| the appropriate mode is the access mode for the OPEN | ||||
| stateid associated with the lock set represented by the stateid. | ||||
| </li> | ||||
| <li> | ||||
| For delegation stateids, the access mode is based on the type of delegation. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When a READ, WRITE, or SETATTR (that specifies the | ||||
| size attribute) operation is done, the operation is subject to checking against | ||||
| the access mode to verify that the operation is appropriate given the | ||||
| stateid with which the operation is associated. | ||||
| </t> | ||||
| <t> | ||||
| In the case of WRITE-type operations (i.e., WRITEs and SETATTRs that | ||||
| set size), the server <bcp14>MUST</bcp14> verify that the access mode allows writing | ||||
| and <bcp14>MUST</bcp14> return an NFS4ERR_OPENMODE error if it does not. In the case of | ||||
| READ, the server may perform the corresponding check on the access | ||||
| mode, or it may choose to allow READ on OPENs for OPEN4_SHARE_ACCESS_WRITE, to | ||||
| accommodate clients whose WRITE implementation may unavoidably do | ||||
| reads (e.g., due to buffer cache constraints). However, even if READs | ||||
| are allowed in these circumstances, the server <bcp14>MUST</bcp14> still check for | ||||
| locks that conflict with the READ (e.g., another OPEN specified OPEN4_SHARE_DENY_READ or OPEN4_SHARE_DENY_BOTH). Note that a server that does enforce the access mode check | ||||
| on READs need not explicitly check for conflicting share reservations | ||||
| since the existence of OPEN for OPEN4_SHARE_ACCESS_READ guarantees that no | ||||
| conflicting share reservation can exist. | ||||
| </t> | ||||
| <t> | ||||
| The READ bypass special stateid (all bits of "other" and "seqid" set | ||||
| to one) | ||||
| indicates a desire to bypass locking checks. The server <bcp14>MAY</bcp14> | ||||
| allow READ operations to bypass | ||||
| locking checks at the server, when this special stateid is used. | ||||
| However, WRITE operations with | ||||
| this special stateid value <bcp14>MUST NOT</bcp14> bypass locking checks and are | ||||
| treated exactly the same as if a special stateid for anonymous state | ||||
| were used. | ||||
| </t> | ||||
| <t> | ||||
| A lock may not be granted while a READ or WRITE operation using one of | ||||
| the special stateids is being performed and the scope of the lock | ||||
| to be granted would conflict with the READ or WRITE operation. | ||||
| This can occur when: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| A mandatory byte-range lock is requested with a byte-range that | ||||
| conflicts with the byte-range of the READ or WRITE operation. | ||||
| For the purposes of this paragraph, a conflict occurs when | ||||
| a shared lock is requested and a WRITE operation is being | ||||
| performed, or an exclusive lock is requested and either a | ||||
| READ or a WRITE operation is being performed. | ||||
| </li> | ||||
| <li> | ||||
| A share reservation is requested that denies reading and/or | ||||
| writing and the corresponding operation is being performed. | ||||
| </li> | ||||
| <li> | ||||
| A delegation is to be granted and the delegation type would | ||||
| prevent the I/O operation, i.e., READ and WRITE conflict with | ||||
| an OPEN_DELEGATE_WRITE delegation and WRITE conflicts with an OPEN_DELEGATE_READ delegation. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When a client holds a delegation, it needs to ensure | ||||
| that the stateid sent conveys the association of | ||||
| operation with the delegation, to avoid the delegation from | ||||
| being avoidably recalled. When the delegation stateid, | ||||
| a stateid open associated with that delegation, or a stateid | ||||
| representing byte-range locks derived from such an open is | ||||
| used, the server knows that the READ, WRITE, or SETATTR | ||||
| does not conflict with the delegation but is sent under | ||||
| the aegis of the delegation. Even though it is possible | ||||
| for the server to determine from the client ID (via | ||||
| the session ID) that the client does in fact have a | ||||
| delegation, the server is not obliged to check this, so | ||||
| using a special stateid can result in avoidable recall | ||||
| of the delegation. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Use of the Stateid and Locking" --> | ||||
| </section> | ||||
| <!-- [auth] "Opens and Byte-Range Locks" --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Lock Ranges</name> | ||||
| <t> | ||||
| The protocol allows a lock-owner to request a lock with a byte-range | ||||
| and then either upgrade, downgrade, or unlock a sub-range of | ||||
| the initial lock, or a byte-range that | ||||
| overlaps -- fully or partially -- either with that initial lock or a | ||||
| combination of a set of existing locks for the same lock-owner. It | ||||
| is expected that this will be an uncommon type of request. In any | ||||
| case, servers or server file systems may not be able to support | ||||
| sub-range lock semantics. In the event that a server receives a | ||||
| locking request that represents a sub-range of current locking state | ||||
| for the lock-owner, the server is allowed to return the error | ||||
| NFS4ERR_LOCK_RANGE to signify that it does not support sub-range lock | ||||
| operations. Therefore, the client should be prepared to receive this | ||||
| error and, if appropriate, report the error to the requesting | ||||
| application. | ||||
| </t> | ||||
| <t> | ||||
| The client is discouraged from combining multiple independent locking | ||||
| ranges that happen to be adjacent into a single request since the | ||||
| server may not support sub-range requests for reasons related to | ||||
| the recovery of byte-range locking state in the event of server failure. As | ||||
| discussed in <xref target="server_failure" format="default"/>, the | ||||
| server may employ certain optimizations during recovery that work | ||||
| effectively only when the client's behavior during lock recovery is | ||||
| similar to the client's locking behavior prior to server failure. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Lock Ranges" --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Upgrading and Downgrading Locks</name> | ||||
| <t> | ||||
| If a client has a WRITE_LT lock on a byte-range, it can request an atomic | ||||
| downgrade of the lock to a READ_LT lock via the LOCK operation, by setting | ||||
| the type to READ_LT. If the server supports atomic downgrade, the | ||||
| request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP. The | ||||
| client should be prepared to receive this error and, if appropriate, | ||||
| report the error to the requesting application. | ||||
| </t> | ||||
| <t> | ||||
| If a client has a READ_LT lock on a byte-range, it can request an atomic | ||||
| upgrade of the lock to a WRITE_LT lock via the LOCK operation by setting | ||||
| the type to WRITE_LT or WRITEW_LT. If the server does not support | ||||
| atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade | ||||
| can be achieved without an existing conflict, the request will | ||||
| succeed. Otherwise, the server will return either NFS4ERR_DENIED or | ||||
| NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the client | ||||
| sent the LOCK operation with the type set to WRITEW_LT and the server | ||||
| has detected a deadlock. The client should be prepared to receive such | ||||
| errors and, if appropriate, report the error to the requesting | ||||
| application. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Upgrading and Downgrading Locks" --> | ||||
| <section anchor="byte_range_seqid" numbered="true" toc="default"> | ||||
| <name>Stateid Seqid Values and Byte-Range Locks</name> | ||||
| <t> | ||||
| When a LOCK or LOCKU operation is performed, | ||||
| the stateid returned has the same "other" value as the argument's | ||||
| stateid, and a | ||||
| "seqid" value that is incremented (relative to the argument's | ||||
| stateid) to reflect the occurrence | ||||
| of the LOCK or LOCKU operation. The server <bcp14>MUST</bcp14> increment | ||||
| the value of the "seqid" field whenever there is any change | ||||
| to the locking status of any byte offset as described by | ||||
| any of the locks covered by the stateid. A change in locking | ||||
| status includes a change from locked to unlocked or the reverse or | ||||
| a change from being locked for READ_LT to being locked for WRITE_LT | ||||
| or the reverse. | ||||
| </t> | ||||
| <t> | ||||
| When there is no such change, as, for example, when a range | ||||
| already locked for WRITE_LT is locked again for WRITE_LT, the | ||||
| server <bcp14>MAY</bcp14> increment the "seqid" value. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Stateid Sequence Values and Byte-Range Locks" --> | ||||
| <section anchor="multiple_openowners" numbered="true" toc="default"> | ||||
| <name>Issues with Multiple Open-Owners</name> | ||||
| <t> | ||||
| When the same file is opened by multiple open-owners, | ||||
| a client will have multiple OPEN stateids for that | ||||
| file, each associated with a different open-owner. | ||||
| In that case, there can be multiple LOCK and LOCKU | ||||
| requests for the same lock-owner sent using the | ||||
| different OPEN stateids, and so a situation may | ||||
| arise in which there are multiple stateids, each | ||||
| representing byte-range locks on the same file and | ||||
| held by the same lock-owner but each associated with | ||||
| a different open-owner. | ||||
| </t> | ||||
| <t> | ||||
| In such a situation, the locking status of each byte | ||||
| (i.e., whether it is locked, the READ_LT or WRITE_LT type of | ||||
| the lock, and the lock-owner holding the lock) <bcp14>MUST</bcp14> | ||||
| reflect the last LOCK or LOCKU operation done for the | ||||
| lock-owner in question, independent of the stateid through | ||||
| which the request was sent. | ||||
| </t> | ||||
| <t> | ||||
| When a byte is locked by the lock-owner in question, the | ||||
| open-owner to which that byte-range lock is assigned <bcp14>SHOULD</bcp14> be that | ||||
| of the open-owner associated with the stateid through | ||||
| which the last LOCK of that byte was done. When there | ||||
| is a change in the open-owner associated with locks for | ||||
| the stateid through which a LOCK or LOCKU was done, the | ||||
| "seqid" field of the stateid <bcp14>MUST</bcp14> be incremented, even | ||||
| if the locking, in terms of lock-owners has not changed. | ||||
| When there is a change to the set of locked bytes associated | ||||
| with a different stateid for the same lock-owner, i.e., | ||||
| associated with a different open-owner, the "seqid" value | ||||
| for that stateid <bcp14>MUST NOT</bcp14> be incremented. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Issues with Multiple Open-Owners" --> | ||||
| <section anchor="blocking_locks" numbered="true" toc="default"> | ||||
| <name>Blocking Locks</name> | ||||
| <t> | ||||
| Some clients require the support of blocking locks. While NFSv4.1 | ||||
| provides a callback when a previously unavailable lock becomes | ||||
| available, this is an <bcp14>OPTIONAL</bcp14> feature and clients cannot | ||||
| depend on its presence. Clients need to be prepared to continually | ||||
| poll for the lock. This presents a fairness problem. Two of | ||||
| the lock types, READW_LT and WRITEW_LT, are used to indicate to the | ||||
| server that the client is requesting a blocking lock. When the | ||||
| callback is not used, the server should maintain an ordered | ||||
| list of pending blocking locks. When the conflicting lock is | ||||
| released, the server may wait for the period of time equal to | ||||
| lease_time for the first waiting | ||||
| client to re-request the lock. After the lease period expires, the | ||||
| next waiting client request is allowed the lock. Clients are required | ||||
| to poll at an interval sufficiently small that it is likely to acquire | ||||
| the lock in a timely manner. The server is not required to maintain a | ||||
| list of pending blocked locks as it is used to increase fairness and | ||||
| not correct operation. Because of the unordered nature of crash | ||||
| recovery, storing of lock state to stable storage would be required to | ||||
| guarantee ordered granting of blocking locks. | ||||
| </t> | ||||
| <t> | ||||
| Servers may also note the lock types and delay returning denial of the | ||||
| request to allow extra time for a conflicting lock to be released, | ||||
| allowing a successful return. In this way, clients can avoid the | ||||
| burden of needless frequent polling for blocking locks. The server | ||||
| should take care in the length of delay in the event the client | ||||
| retransmits the request. | ||||
| </t> | ||||
| <t> | ||||
| If a server receives a blocking LOCK operation, denies it, and then | ||||
| later receives a nonblocking request for the same lock, which is | ||||
| also denied, then it should remove the lock in question from its list of | ||||
| pending blocking locks. Clients should use such a nonblocking request | ||||
| to indicate to the server that this is the last time they intend to poll | ||||
| for the lock, as may happen when the process requesting the lock is | ||||
| interrupted. This is a courtesy to the server, to prevent it from | ||||
| unnecessarily waiting a lease period before granting other LOCK operations. | ||||
| However, clients are not required to perform this courtesy, and servers | ||||
| must not depend on them doing so. Also, clients must be prepared for | ||||
| the possibility that this final locking request will be accepted. | ||||
| </t> | ||||
| <t> | ||||
| When a server indicates, via the flag OPEN4_RESULT_MAY_NOTIFY_LOCK, that | ||||
| CB_NOTIFY_LOCK callbacks might be done for the current open file, the | ||||
| client should take notice of this, but, since this is a hint, cannot | ||||
| rely on a CB_NOTIFY_LOCK always being done. A client may reasonably | ||||
| reduce the frequency with which it polls for a denied lock, since the | ||||
| greater latency that might occur is likely to be eliminated given a | ||||
| prompt callback, but it still needs to poll. When it receives a | ||||
| CB_NOTIFY_LOCK, it should promptly try to obtain the lock, but it | ||||
| should be aware that other clients may be polling and that the server is under | ||||
| no obligation to reserve the lock for that particular client. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] title="Blocking Locks" --> | ||||
| <section anchor="share_reserve" numbered="true" toc="default"> | ||||
| <name>Share Reservations</name> | ||||
| <t> | ||||
| A share reservation is a mechanism to control access to a file. It is | ||||
| a separate and independent mechanism from byte-range locking. When a | ||||
| client opens a file, it sends an OPEN operation to the server | ||||
| specifying the type of access required (READ, WRITE, or BOTH) and the | ||||
| type of access to deny others (OPEN4_SHARE_DENY_NONE, | ||||
| OPEN4_SHARE_DENY_READ, OPEN4_SHARE_DENY_WRITE, or OPEN4_SHARE_DENY_BOTH). If | ||||
| the OPEN fails, the client will fail the application's open request. | ||||
| </t> | ||||
| <t> | ||||
| Pseudo-code definition of the semantics: | ||||
| </t> | ||||
| <sourcecode type="pseudocode"><![CDATA[ | ||||
| if (request.access == 0) { | ||||
| return (NFS4ERR_INVAL) | ||||
| } else { | ||||
| if ((request.access & file_state.deny)) || | ||||
| (request.deny & file_state.access)) { | ||||
| return (NFS4ERR_SHARE_DENIED) | ||||
| } | ||||
| return (NFS4ERR_OK);]]></sourcecode> | ||||
| <t> | ||||
| When doing this checking of share reservations on OPEN, the current | ||||
| file_state used in the algorithm includes bits that reflect all | ||||
| current opens, including those for the open-owner making the | ||||
| new OPEN request. | ||||
| </t> | ||||
| <t> | ||||
| The constants used for the OPEN and OPEN_DOWNGRADE operations for the | ||||
| access and deny fields are as follows: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const OPEN4_SHARE_ACCESS_READ = 0x00000001; | ||||
| const OPEN4_SHARE_ACCESS_WRITE = 0x00000002; | ||||
| const OPEN4_SHARE_ACCESS_BOTH = 0x00000003; | ||||
| const OPEN4_SHARE_DENY_NONE = 0x00000000; | ||||
| const OPEN4_SHARE_DENY_READ = 0x00000001; | ||||
| const OPEN4_SHARE_DENY_WRITE = 0x00000002; | ||||
| const OPEN4_SHARE_DENY_BOTH = 0x00000003;]]></sourcecode> | ||||
| </section> | ||||
| <!-- [auth] "Share Reservations" --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>OPEN/CLOSE Operations</name> | ||||
| <t> | ||||
| To provide correct share semantics, a client <bcp14>MUST</bcp14> use the OPEN | ||||
| operation to obtain the initial filehandle and indicate the desired | ||||
| access and what access, if any, to deny. Even if the client intends to | ||||
| use a special stateid for anonymous state or READ bypass, | ||||
| it must still obtain the | ||||
| filehandle for the regular file with the OPEN operation so the | ||||
| appropriate share semantics can be applied. Clients that do not | ||||
| have a deny mode built into their programming interfaces for opening | ||||
| a file should request a deny mode of | ||||
| OPEN4_SHARE_DENY_NONE. | ||||
| </t> | ||||
| <t> | ||||
| The OPEN operation with the CREATE flag also subsumes the CREATE | ||||
| operation for regular files as used in previous versions of the NFS | ||||
| protocol. This allows a create with a share to be done atomically. | ||||
| </t> | ||||
| <t> | ||||
| The CLOSE operation removes all share reservations held by the | ||||
| open-owner on that file. If byte-range locks are held, the client | ||||
| <bcp14>SHOULD</bcp14> release all locks before sending a CLOSE operation. The server <bcp14>MAY</bcp14> free | ||||
| all outstanding locks on CLOSE, but some servers may not support the | ||||
| CLOSE of a file that still has byte-range locks held. The server <bcp14>MUST</bcp14> | ||||
| return failure, NFS4ERR_LOCKS_HELD, if any locks would exist after the | ||||
| CLOSE. | ||||
| </t> | ||||
| <t> | ||||
| The LOOKUP operation will return a filehandle without establishing any | ||||
| lock state on the server. Without a valid stateid, the server will | ||||
| assume that the client has the least access. For example, if one | ||||
| client opened a file with OPEN4_SHARE_DENY_BOTH and another client | ||||
| accesses the file via a filehandle obtained through LOOKUP, the | ||||
| second client could only read the file using the special read | ||||
| bypass stateid. The second client could not WRITE the file | ||||
| at all because it would | ||||
| not have a valid stateid from OPEN and the special anonymous stateid would | ||||
| not be allowed access. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "OPEN/CLOSE Operations" --> | ||||
| <section anchor="open_upgrade" numbered="true" toc="default"> | ||||
| <name>Open Upgrade and Downgrade</name> | ||||
| <t> | ||||
| When an OPEN is done for a file and the open-owner for which the OPEN | ||||
| is being done already has the file open, the result is to upgrade the | ||||
| open file status maintained on the server to include the access and | ||||
| deny bits specified by the new OPEN as well as those for the existing | ||||
| OPEN. The result is that there is one open file, as far as the | ||||
| protocol is concerned, and it includes the union of the access and | ||||
| deny bits for all of the OPEN requests completed. The OPEN | ||||
| is represented by a single stateid whose "other" value matches | ||||
| that of the original open, and whose "seqid" value is incremented | ||||
| to reflect the occurrence of the upgrade. The increment is required | ||||
| in cases in which the "upgrade" results in no change to the open mode (e.g., an OPEN | ||||
| is done for read when the existing open file is opened for | ||||
| OPEN4_SHARE_ACCESS_BOTH). Only a single CLOSE will be done to reset the | ||||
| effects of both OPENs. The client may use the stateid returned | ||||
| by the OPEN effecting the upgrade or with a stateid sharing the | ||||
| same "other" field and a seqid of zero, | ||||
| although care needs to be taken as far as upgrades that happen | ||||
| while the CLOSE is pending. Note that the | ||||
| client, when sending the OPEN, may not know that the same file is in | ||||
| fact being opened. The above only applies if both OPENs result in | ||||
| the OPENed object being designated by the same filehandle. | ||||
| </t> | ||||
| <t> | ||||
| When the server chooses to export multiple filehandles corresponding | ||||
| to the same file object and returns different filehandles on two | ||||
| different OPENs of the same file object, the server <bcp14>MUST NOT</bcp14> "OR" | ||||
| together the access and deny bits and coalesce the two open files. | ||||
| Instead, the server must maintain separate OPENs with separate | ||||
| stateids and will require separate CLOSEs to free them. | ||||
| </t> | ||||
| <t> | ||||
| When multiple open files on the client are merged into a single OPEN | ||||
| file object on the server, the close of one of the open files (on the | ||||
| client) may necessitate change of the access and deny status of the | ||||
| open file on the server. This is because the union of the access and | ||||
| deny bits for the remaining opens may be smaller (i.e., a proper | ||||
| subset) than previously. The OPEN_DOWNGRADE operation is used to make | ||||
| the necessary change and the client should use it to update the server | ||||
| so that share reservation requests by other clients are handled | ||||
| properly. The stateid returned has the same "other" field as | ||||
| that passed to the server. The "seqid" value in the returned | ||||
| stateid <bcp14>MUST</bcp14> be incremented, even in situations in which there is | ||||
| no change to the access and deny bits for the file. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Open Upgrade and Downgrade" --> | ||||
| <section anchor="parallel_opens" numbered="true" toc="default"> | ||||
| <name>Parallel OPENs</name> | ||||
| <t> | ||||
| Unlike the case of NFSv4.0, in which OPEN operations for the same | ||||
| open-owner are inherently serialized because of the owner-based seqid, | ||||
| multiple OPENs for the same open-owner may be done in parallel. When | ||||
| clients do this, they may encounter situations in which, because | ||||
| of the existence of hard links, two OPEN operations may turn out | ||||
| to open the same file, with a later OPEN performed being an upgrade of | ||||
| the first, with this fact only visible to the | ||||
| client once the operations complete. | ||||
| </t> | ||||
| <t> | ||||
| In this situation, clients may determine the order in which the | ||||
| OPENs were performed by examining the stateids returned by the OPENs. | ||||
| Stateids that share a common value of the "other" field can be | ||||
| recognized as having opened the same file, with the order of the | ||||
| operations determinable from the order of the "seqid" fields, mod | ||||
| any possible wraparound of the 32-bit field. | ||||
| </t> | ||||
| <t> | ||||
| When the possibility exists that the client will send multiple | ||||
| OPENs for the same open-owner in parallel, it may be the case that | ||||
| an open upgrade may happen without the client knowing beforehand | ||||
| that this could happen. Because of this possibility, CLOSEs and | ||||
| OPEN_DOWNGRADEs should generally be sent with a non-zero seqid | ||||
| in the stateid, to avoid the possibility that the status change | ||||
| associated with an open upgrade is not inadvertently lost. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "Parallel OPENs" --> | ||||
| <section anchor="open_br_reclaim" numbered="true" toc="default"> | ||||
| <name>Reclaim of Open and Byte-Range Locks</name> | ||||
| <t> | ||||
| Special forms of the LOCK and OPEN operations are provided when it | ||||
| is necessary to re-establish byte-range locks or opens after a | ||||
| server failure. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| To reclaim existing opens, an OPEN operation is performed | ||||
| using a CLAIM_PREVIOUS. Because the client, in this type | ||||
| of situation, will have already opened the file and have | ||||
| the filehandle of the target file, this operation requires | ||||
| that the current filehandle be the target file, rather than | ||||
| a directory, and no file name is specified. | ||||
| </li> | ||||
| <li> | ||||
| To reclaim byte-range locks, a LOCK operation with the | ||||
| reclaim parameter set to true is used. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Reclaims of opens associated with delegations are discussed in | ||||
| <xref target="delegation_recovery" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] "File Locking and Share Reservations" --> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Client-Side Caching</name> | ||||
| <t> | ||||
| Client-side caching of data, of file attributes, and of file names is | ||||
| essential to providing good performance with the NFS protocol. | ||||
| Providing distributed cache coherence is a difficult problem, and | ||||
| previous versions of the NFS protocol have not attempted it. Instead, | ||||
| several NFS client implementation techniques have been used to reduce | ||||
| the problems that a lack of coherence poses for users. These | ||||
| techniques have not been clearly defined by earlier protocol | ||||
| specifications, and it is often unclear what is valid or invalid client | ||||
| behavior. | ||||
| </t> | ||||
| <t> | ||||
| The NFSv4.1 protocol uses many techniques similar to those that | ||||
| have been used in previous protocol versions. The NFSv4.1 | ||||
| protocol does not provide distributed cache coherence. However, it | ||||
| defines a more limited set of caching guarantees to allow locks and | ||||
| share reservations to be used without destructive interference from | ||||
| client-side caching. | ||||
| </t> | ||||
| <t> | ||||
| In addition, the NFSv4.1 protocol introduces a delegation | ||||
| mechanism, which allows many decisions normally made by the server to | ||||
| be made locally by clients. This mechanism provides efficient support | ||||
| of the common cases where sharing is infrequent or where sharing is | ||||
| read-only. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Performance Challenges for Client-Side Caching</name> | ||||
| <t> | ||||
| Caching techniques used in previous versions of the NFS protocol have | ||||
| been successful in providing good performance. However, several | ||||
| scalability challenges can arise when those techniques are used with | ||||
| very large numbers of clients. This is particularly true when clients | ||||
| are geographically distributed, which classically increases the latency | ||||
| for cache revalidation requests. | ||||
| </t> | ||||
| <t> | ||||
| The previous versions of the NFS protocol repeat their file data cache | ||||
| validation requests at the time the file is opened. This behavior can | ||||
| have serious performance drawbacks. A common case is one in which a | ||||
| file is only accessed by a single client. Therefore, sharing is | ||||
| infrequent. | ||||
| </t> | ||||
| <t> | ||||
| In this case, repeated references to the server to find that no | ||||
| conflicts exist are expensive. A better option with regards to | ||||
| performance is to allow a client that repeatedly opens a file to do so | ||||
| without reference to the server. This is done until potentially | ||||
| conflicting operations from another client actually occur. | ||||
| </t> | ||||
| <t> | ||||
| A similar situation arises in connection with byte-range locking. Sending | ||||
| LOCK and LOCKU operations as well as the READ and | ||||
| WRITE operations necessary to make data caching consistent with the | ||||
| locking semantics (see <xref target="dc_file_locking" format="default"/>) | ||||
| can severely limit performance. When locking is used to provide | ||||
| protection against infrequent conflicts, a large penalty is incurred. | ||||
| This penalty may discourage the use of byte-range locking by applications. | ||||
| </t> | ||||
| <t> | ||||
| The NFSv4.1 protocol provides more aggressive caching strategies | ||||
| with the following design goals: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Compatibility with a large range of server semantics. | ||||
| </li> | ||||
| <li> | ||||
| Providing the same caching benefits as previous versions of | ||||
| the NFS protocol when unable to support the more aggressive model. | ||||
| </li> | ||||
| <li> | ||||
| Requirements for aggressive caching are organized so that a | ||||
| large portion of the benefit can be obtained even when not | ||||
| all of the requirements can be met. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The appropriate requirements for the server are discussed in later | ||||
| sections in which specific forms of caching are covered (see | ||||
| <xref target="open_delegation" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="deleg_and_cb" numbered="true" toc="default"> | ||||
| <name>Delegation and Callbacks</name> | ||||
| <t> | ||||
| Recallable delegation of server responsibilities for a file to a | ||||
| client improves performance by avoiding repeated requests to the | ||||
| server in the absence of inter-client conflict. With the use of a | ||||
| "callback" RPC from server to client, a server recalls delegated | ||||
| responsibilities when another client engages in sharing of a delegated | ||||
| file. | ||||
| </t> | ||||
| <t> | ||||
| A delegation is passed from the server to the client, specifying the | ||||
| object of the delegation and the type of delegation. There are | ||||
| different types of delegations, but each type contains a stateid to be | ||||
| used to represent the delegation when performing operations that | ||||
| depend on the delegation. This stateid is similar to those associated | ||||
| with locks and share reservations but differs in that the stateid for | ||||
| a delegation is associated with a client ID and may be used on behalf | ||||
| of all the open-owners for the given client. A delegation is made | ||||
| to the client as a whole and not to any specific process or thread of | ||||
| control within it. | ||||
| </t> | ||||
| <t> | ||||
| The backchannel is established by CREATE_SESSION and | ||||
| BIND_CONN_TO_SESSION, and the client is required | ||||
| to maintain it. Because the backchannel may be down, even | ||||
| temporarily, | ||||
| correct protocol operation does not depend on | ||||
| them. Preliminary testing of backchannel functionality by means of a | ||||
| CB_COMPOUND procedure with a single operation, CB_SEQUENCE, | ||||
| can be used to check the continuity of the backchannel. A | ||||
| server avoids delegating responsibilities until it has | ||||
| determined that the backchannel exists. Because the granting of a | ||||
| delegation is always conditional upon the absence of conflicting | ||||
| access, clients <bcp14>MUST NOT</bcp14> assume that a delegation will be granted and | ||||
| they <bcp14>MUST</bcp14> always be prepared for OPENs, WANT_DELEGATIONs, and | ||||
| GET_DIR_DELEGATIONs to be processed without any | ||||
| delegations being granted. | ||||
| </t> | ||||
| <t> | ||||
| Unlike locks, an operation by a second client to a delegated file will | ||||
| cause the server to recall a delegation through a callback. For | ||||
| individual operations, we will describe, under IMPLEMENTATION, when | ||||
| such operations are required to effect a recall. A number of | ||||
| points should be noted, however. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The server is free to recall a delegation | ||||
| whenever it feels it is desirable and may do so even if no | ||||
| operations requiring recall are being done. | ||||
| </li> | ||||
| <li> | ||||
| Operations done outside the NFSv4.1 protocol, due to, for | ||||
| example, access by other protocols, or by local access, | ||||
| also need to result in delegation recall when they make | ||||
| analogous changes to file system data. What is crucial | ||||
| is if the change would invalidate the guarantees provided | ||||
| by the delegation. When this is possible, the | ||||
| delegation needs to be recalled and <bcp14>MUST</bcp14> be returned or | ||||
| revoked before allowing the operation to proceed. | ||||
| </li> | ||||
| <li> | ||||
| The semantics of the file system are crucial in defining | ||||
| when delegation recall is required. If a particular change | ||||
| within a specific implementation causes change to a | ||||
| file attribute, then delegation recall is required, whether | ||||
| that operation has been specifically listed as requiring | ||||
| delegation recall. Again, what is critical is whether the | ||||
| guarantees provided by the delegation are being invalidated. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Despite those caveats, the implementation sections for a number | ||||
| of operations describe situations in which delegation recall | ||||
| would be required under some common circumstances: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| For GETATTR, see <xref target="OP_GETATTR_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| For OPEN, see <xref target="OP_OPEN_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| For READ, see <xref target="OP_READ_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| For REMOVE, see <xref target="OP_REMOVE_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| For RENAME, see <xref target="OP_RENAME_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| For SETATTR, see <xref target="OP_SETATTR_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| For WRITE, see <xref target="OP_WRITE_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| On recall, the client holding the delegation needs to flush modified | ||||
| state (such as modified data) to the server and return the | ||||
| delegation. The conflicting request will not be acted on until | ||||
| the recall is complete. The recall is considered complete when | ||||
| the client returns the delegation or the server times its wait | ||||
| for the delegation to be returned and revokes the delegation as | ||||
| a result of the timeout. In the interim, the server will either | ||||
| delay responding to conflicting requests or respond to them with | ||||
| NFS4ERR_DELAY. Following the resolution of the recall, the | ||||
| server has the information necessary to grant or deny the second | ||||
| client's request. | ||||
| </t> | ||||
| <t> | ||||
| At the time the client receives a delegation recall, it may have | ||||
| substantial state that needs to be flushed to the server. Therefore, | ||||
| the server should allow sufficient time for the delegation to be | ||||
| returned since it may involve numerous RPCs to the server. If the | ||||
| server is able to determine that the client is diligently flushing | ||||
| state to the server as a result of the recall, the server may extend | ||||
| the usual time allowed for a recall. However, the time allowed for | ||||
| recall completion should not be unbounded. | ||||
| </t> | ||||
| <t> | ||||
| An example of this is when responsibility to mediate opens on a given | ||||
| file is delegated to a client (see <xref target="open_delegation" format="default"/>). | ||||
| The server will not know what opens are in effect on the client. | ||||
| Without this knowledge, the server will be unable to determine if the | ||||
| access and deny states for the file allow any particular open until | ||||
| the delegation for the file has been returned. | ||||
| </t> | ||||
| <t> | ||||
| A client failure or a network partition can result in failure to | ||||
| respond to a recall callback. In this case, the server will revoke the | ||||
| delegation, which in turn will render useless any modified state still | ||||
| on the client. | ||||
| </t> | ||||
| <section anchor="delegation_recovery" numbered="true" toc="default"> | ||||
| <name>Delegation Recovery</name> | ||||
| <t> | ||||
| There are three situations that delegation recovery needs to deal with: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| client restart | ||||
| </li> | ||||
| <li> | ||||
| server restart | ||||
| </li> | ||||
| <li> | ||||
| network partition (full or backchannel-only) | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| In the event the client restarts, the failure to renew | ||||
| the lease will result in the revocation of byte-range locks and share | ||||
| reservations. Delegations, however, may be treated a bit differently. | ||||
| </t> | ||||
| <t> | ||||
| There will be situations in which delegations will need to be | ||||
| re-established after a client restarts. The reason for this | ||||
| is that the client may have file data stored locally and this data was | ||||
| associated with the previously held delegations. The client will need | ||||
| to re-establish the appropriate file state on the server. | ||||
| </t> | ||||
| <t> | ||||
| To allow for this type of client recovery, the server <bcp14>MAY</bcp14> extend the | ||||
| period for delegation recovery beyond the typical lease expiration | ||||
| period. This implies that requests from other clients that conflict | ||||
| with these delegations will need to wait. Because the normal recall | ||||
| process may require significant time for the client to flush changed | ||||
| state to the server, other clients need be prepared for delays that | ||||
| occur because of a conflicting delegation. This longer interval would | ||||
| increase the window for clients to restart and consult stable storage | ||||
| so that the delegations can be reclaimed. For OPEN delegations, such | ||||
| delegations are reclaimed using OPEN with a claim type of | ||||
| CLAIM_DELEGATE_PREV or CLAIM_DELEG_PREV_FH (see Sections | ||||
| <xref target="data_caching_revocation" format="counter"/> | ||||
| and <xref target="OP_OPEN" format="counter"/> for discussion of OPEN delegation | ||||
| and the details of OPEN, respectively). | ||||
| </t> | ||||
| <t> | ||||
| A server <bcp14>MAY</bcp14> support claim types of CLAIM_DELEGATE_PREV and | ||||
| CLAIM_DELEG_PREV_FH, and if it | ||||
| does, it <bcp14>MUST NOT</bcp14> remove delegations upon a CREATE_SESSION that | ||||
| confirm a client ID created by EXCHANGE_ID. | ||||
| Instead, the server <bcp14>MUST</bcp14>, for a period of time no less than that of the value of | ||||
| the lease_time attribute, maintain the client's delegations to allow | ||||
| time for the client to send CLAIM_DELEGATE_PREV and/or CLAIM_DELEG_PREV_FH requests. The server | ||||
| that supports CLAIM_DELEGATE_PREV and/or CLAIM_DELEG_PREV_FH <bcp14>MUST</bcp14> support the DELEGPURGE | ||||
| operation. | ||||
| </t> | ||||
| <t> | ||||
| When the server restarts, delegations are reclaimed (using | ||||
| the OPEN operation with CLAIM_PREVIOUS) in a similar fashion to byte-range | ||||
| locks and share reservations. However, there is a slight semantic | ||||
| difference. In the normal case, if the server decides that a | ||||
| delegation should not be granted, it performs the requested action | ||||
| (e.g., OPEN) without granting any delegation. For reclaim, the server | ||||
| grants the delegation but a special designation is applied so that the | ||||
| client treats the delegation as having been granted but recalled by | ||||
| the server. Because of this, the client has the duty to write all | ||||
| modified state to the server and then return the delegation. This | ||||
| process of handling delegation reclaim reconciles three principles of | ||||
| the NFSv4.1 protocol: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Upon reclaim, a client reporting resources assigned to it by an | ||||
| earlier server instance must be granted those resources. | ||||
| </li> | ||||
| <li> | ||||
| The server has unquestionable authority to determine whether | ||||
| delegations are to be granted and, once granted, whether they are to | ||||
| be continued. | ||||
| </li> | ||||
| <li> | ||||
| The use of callbacks should not be depended upon until the client has | ||||
| proven its ability to receive them. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When a client needs to reclaim a delegation and there is no associated | ||||
| open, the client may use the CLAIM_PREVIOUS variant of the | ||||
| WANT_DELEGATION operation. However, since the server is not required | ||||
| to support this operation, an alternative is to reclaim via a dummy OPEN | ||||
| together with the delegation | ||||
| using an OPEN of type CLAIM_PREVIOUS. The dummy open file can | ||||
| be released using a CLOSE to re-establish the original state to be | ||||
| reclaimed, a delegation without an associated open. | ||||
| </t> | ||||
| <t> | ||||
| When a client has more than a single open associated with a delegation, | ||||
| state for those additional opens can be established using OPEN | ||||
| operations of type CLAIM_DELEGATE_CUR. When these are used to | ||||
| establish opens associated with reclaimed delegations, the | ||||
| server <bcp14>MUST</bcp14> allow them when made within the grace period. | ||||
| </t> | ||||
| <t> | ||||
| When a network partition occurs, delegations are subject to freeing by | ||||
| the server when the lease renewal period expires. This is similar to | ||||
| the behavior for locks and share reservations. For delegations, | ||||
| however, the server may extend the period in which conflicting | ||||
| requests are held off. Eventually, the occurrence of a conflicting | ||||
| request from another client will cause revocation of the delegation. | ||||
| A loss of the backchannel (e.g., by later network configuration | ||||
| change) will have the same effect. A recall request will fail and | ||||
| revocation of the delegation will result. | ||||
| </t> | ||||
| <t> | ||||
| A client normally finds out about revocation of a delegation when it | ||||
| uses a stateid associated with a delegation and receives one of the | ||||
| errors NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, or NFS4ERR_DELEG_REVOKED. | ||||
| It also may find out about delegation revocation | ||||
| after a client restart when it attempts to reclaim a delegation and | ||||
| receives that same error. Note that in the case of a revoked OPEN_DELEGATE_WRITE delegation, there are issues because data may have been modified | ||||
| by the client whose delegation is revoked and separately by other | ||||
| clients. See <xref target="revocation_recovery_write" format="default"/> | ||||
| for a discussion of such issues. Note also that when | ||||
| delegations are revoked, information about the revoked delegation will | ||||
| be written by the server to stable storage (as described in | ||||
| <xref target="network_partitions_and_recovery" format="default"/>). This is done | ||||
| to deal with the case in | ||||
| which a server restarts after revoking a delegation but before the | ||||
| client holding the revoked delegation is notified about the | ||||
| revocation. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Data Caching</name> | ||||
| <t> | ||||
| When applications share access to a set of files, they need to be | ||||
| implemented so as to take account of the possibility of conflicting | ||||
| access by another application. This is true whether the applications | ||||
| in question execute on different clients or reside on the same client. | ||||
| </t> | ||||
| <t> | ||||
| Share reservations and byte-range locks are the facilities the NFSv4.1 protocol | ||||
| provides to allow applications to coordinate access by | ||||
| using mutual exclusion facilities. The NFSv4.1 protocol's | ||||
| data caching must be implemented such that it does not invalidate the | ||||
| assumptions on which those using these facilities depend. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Data Caching and OPENs</name> | ||||
| <t> | ||||
| In order to avoid invalidating the sharing assumptions on which | ||||
| applications rely, NFSv4.1 clients should not provide cached | ||||
| data to applications or modify it on behalf of an application when it | ||||
| would not be valid to obtain or modify that same data via a READ or | ||||
| WRITE operation. | ||||
| </t> | ||||
| <t> | ||||
| Furthermore, in the absence of an OPEN delegation | ||||
| (see <xref target="open_delegation" format="default"/>), | ||||
| two additional rules apply. Note that these rules are | ||||
| obeyed in practice by many NFSv3 clients. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| First, cached data present on a client must be revalidated after doing | ||||
| an OPEN. Revalidating means that the client fetches the change | ||||
| attribute from the server, compares it with the cached change | ||||
| attribute, and if different, declares the cached data (as well as the | ||||
| cached attributes) as invalid. This is to ensure that the data for | ||||
| the OPENed file is still correctly reflected in the client's cache. | ||||
| This validation must be done at least when the client's OPEN operation | ||||
| includes a deny of OPEN4_SHARE_DENY_WRITE or | ||||
| OPEN4_SHARE_DENY_BOTH, thus terminating a period in which | ||||
| other | ||||
| clients may have had the opportunity to open the file with | ||||
| OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH | ||||
| access. Clients may choose to do the revalidation more often (i.e., at | ||||
| OPENs specifying a deny mode of OPEN4_SHARE_DENY_NONE) to parallel the NFSv3 protocol's | ||||
| practice for the benefit of users assuming this degree of cache | ||||
| revalidation. | ||||
| </t> | ||||
| <t> | ||||
| Since the change attribute is updated for data and metadata | ||||
| modifications, some client implementors may be tempted to use the | ||||
| time_modify attribute and not the change attribute to validate cached data, so that | ||||
| metadata changes do not spuriously invalidate clean data. The | ||||
| implementor is cautioned in this approach. The change attribute is | ||||
| guaranteed to change for each update to the file, whereas time_modify | ||||
| is guaranteed to change only at the granularity of the time_delta | ||||
| attribute. Use by the client's data cache validation logic of | ||||
| time_modify and not change runs the risk of the client incorrectly | ||||
| marking stale data as valid. Thus, any cache validation approach | ||||
| by the client <bcp14>MUST</bcp14> include the use of the change attribute. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| Second, modified data must be flushed to the server before closing a | ||||
| file OPENed for OPEN4_SHARE_ACCESS_WRITE. This is complementary to the first rule. If | ||||
| the data is not flushed at CLOSE, the revalidation done | ||||
| after the client OPENs a file is unable to achieve its | ||||
| purpose. The other aspect to flushing the data before | ||||
| close is that the data must be committed to stable | ||||
| storage, at the server, before the CLOSE operation is | ||||
| requested by the client. In the case of a server restart and a CLOSEd | ||||
| file, it may not be possible to retransmit the data to be written to | ||||
| the file, hence, this requirement. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="dc_file_locking" numbered="true" toc="default"> | ||||
| <name>Data Caching and File Locking</name> | ||||
| <t> | ||||
| For those applications that choose to use byte-range locking instead of | ||||
| share reservations to exclude inconsistent file access, there is an | ||||
| analogous set of constraints that apply to client-side data caching. | ||||
| These rules are effective only if the byte-range locking is used in a way | ||||
| that matches in an equivalent way the actual READ and WRITE operations | ||||
| executed. This is as opposed to byte-range locking that is based on pure | ||||
| convention. For example, it is possible to manipulate a two-megabyte | ||||
| file by dividing the file into two one-megabyte ranges and protecting | ||||
| access to the two byte-ranges by byte-range locks on bytes zero and one. A WRITE_LT lock on | ||||
| byte zero of the file would represent the right to perform | ||||
| READ and WRITE operations on the first byte-range. A WRITE_LT lock on | ||||
| byte one of the file would represent the right to perform READ and WRITE | ||||
| operations on the second byte-range. As long as all applications | ||||
| manipulating the file obey this convention, they will work on a local | ||||
| file system. However, they may not work with the NFSv4.1 | ||||
| protocol unless clients refrain from data caching. | ||||
| </t> | ||||
| <t> | ||||
| The rules for data caching in the byte-range locking environment are: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| First, when a client obtains a byte-range lock for a particular byte-range, the | ||||
| data cache corresponding to that byte-range (if any cache data exists) | ||||
| must be revalidated. If the change attribute indicates that the file | ||||
| may have been updated since the cached data was obtained, the client | ||||
| must flush or invalidate the cached data for the newly locked byte-range. | ||||
| A client might choose to invalidate all of the non-modified cached data | ||||
| that it has for the file, but the only requirement for correct | ||||
| operation is to invalidate all of the data in the newly locked byte-range. | ||||
| </li> | ||||
| <li> | ||||
| Second, before releasing a WRITE_LT lock for a byte-range, all modified data | ||||
| for that byte-range must be flushed to the server. The modified data must | ||||
| also be written to stable storage. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Note that flushing data to the server and the invalidation of cached | ||||
| data must reflect the actual byte-ranges locked or unlocked. Rounding | ||||
| these up or down to reflect client cache block boundaries will cause | ||||
| problems if not carefully done. For example, writing a modified block | ||||
| when only half of that block is within an area being unlocked may | ||||
| cause invalid modification to the byte-range outside the unlocked area. | ||||
| This, in turn, may be part of a byte-range locked by another client. | ||||
| Clients can avoid this situation by synchronously performing portions | ||||
| of WRITE operations that overlap that portion (initial or final) that | ||||
| is not a full block. Similarly, invalidating a locked area that is | ||||
| not an integral number of full buffer blocks would require the client | ||||
| to read one or two partial blocks from the server if the revalidation | ||||
| procedure shows that the data that the client possesses may not be | ||||
| valid. | ||||
| </t> | ||||
| <t> | ||||
| The data that is written to the server as a prerequisite to the | ||||
| unlocking of a byte-range must be written, at the server, to stable | ||||
| storage. The client may accomplish this either with synchronous | ||||
| writes or by following asynchronous writes with a COMMIT operation. | ||||
| This is required because retransmission of the modified data after a | ||||
| server restart might conflict with a lock held by another client. | ||||
| </t> | ||||
| <t> | ||||
| A client implementation may choose to accommodate applications that | ||||
| use byte-range locking in non-standard ways (e.g., using a byte-range lock as a | ||||
| global semaphore) by flushing to the server more data upon a LOCKU | ||||
| than is covered by the locked range. This may include modified data | ||||
| within files other than the one for which the unlocks are being done. | ||||
| In such cases, the client must not interfere with applications whose | ||||
| READs and WRITEs are being done only within the bounds of byte-range locks | ||||
| that the application holds. For example, an application locks a | ||||
| single byte of a file and proceeds to write that single byte. A | ||||
| client that chose to handle a LOCKU by flushing all modified data to | ||||
| the server could validly write that single byte in response to an | ||||
| unrelated LOCKU operation. However, it would not be valid to write the entire | ||||
| block in which that single written byte was located since it includes | ||||
| an area that is not locked and might be locked by another client. | ||||
| Client implementations can avoid this problem by dividing files with | ||||
| modified data into those for which all modifications are done to areas | ||||
| covered by an appropriate byte-range lock and those for which there are | ||||
| modifications not covered by a byte-range lock. Any writes done for the | ||||
| former class of files must not include areas not locked and thus not | ||||
| modified on the client. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Data Caching and Mandatory File Locking</name> | ||||
| <t> | ||||
| Client-side data caching needs to respect mandatory byte-range locking when | ||||
| it is in effect. The presence of mandatory byte-range locking for a given | ||||
| file is indicated when the client gets back NFS4ERR_LOCKED from a READ | ||||
| or WRITE operation on a file for which it has an appropriate share reservation. When | ||||
| mandatory locking is in effect for a file, the client must check for | ||||
| an appropriate byte-range lock for data being read or written. If a byte-range lock | ||||
| exists for the range being read or written, the client may satisfy the | ||||
| request using the client's validated cache. If an appropriate | ||||
| byte-range lock is not held for the range of the read or write, the read or write | ||||
| request must not be satisfied by the client's cache and the request | ||||
| must be sent to the server for processing. When a read or write | ||||
| request partially overlaps a locked byte-range, the request should be | ||||
| subdivided into multiple pieces with each byte-range (locked or not) | ||||
| treated appropriately. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="data_caching_and_file_identity" numbered="true" toc="default"> | ||||
| <name>Data Caching and File Identity</name> | ||||
| <t> | ||||
| When clients cache data, the file data needs to be organized according | ||||
| to the file system object to which the data belongs. For NFSv3 | ||||
| clients, the typical practice has been to assume for the purpose of | ||||
| caching that distinct filehandles represent distinct file system | ||||
| objects. The client then has the choice to organize and maintain the | ||||
| data cache on this basis. | ||||
| </t> | ||||
| <t> | ||||
| In the NFSv4.1 protocol, there is now the possibility to have | ||||
| significant deviations from a "one filehandle per object" model | ||||
| because a filehandle may be constructed on the basis of the object's | ||||
| pathname. Therefore, clients need a reliable method to determine if | ||||
| two filehandles designate the same file system object. If clients | ||||
| were simply to assume that all distinct filehandles denote distinct | ||||
| objects and proceed to do data caching on this basis, caching | ||||
| inconsistencies would arise between the distinct client-side objects | ||||
| that mapped to the same server-side object. | ||||
| </t> | ||||
| <t> | ||||
| By providing a method to differentiate filehandles, the NFSv4.1 | ||||
| protocol alleviates a potential functional regression in comparison | ||||
| with the NFSv3 protocol. Without this method, caching | ||||
| inconsistencies within the same client could occur, and this has not | ||||
| been present in previous versions of the NFS protocol. Note that it | ||||
| is possible to have such inconsistencies with applications executing | ||||
| on multiple clients, but that is not the issue being addressed here. | ||||
| </t> | ||||
| <t> | ||||
| For the purposes of data caching, the following steps allow an | ||||
| NFSv4.1 client to determine whether two distinct filehandles denote | ||||
| the same server-side object: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If GETATTR directed to two filehandles returns different values of the | ||||
| fsid attribute, then the filehandles represent distinct objects. | ||||
| </li> | ||||
| <li> | ||||
| If GETATTR for any file with an fsid that matches the fsid of the two | ||||
| filehandles in question returns a unique_handles attribute with a | ||||
| value of TRUE, then the two objects are distinct. | ||||
| </li> | ||||
| <li> | ||||
| If GETATTR directed to the two filehandles does not return the fileid | ||||
| attribute for both of the handles, then it cannot be determined | ||||
| whether the two objects are the same. Therefore, | ||||
| operations that depend on that knowledge (e.g., | ||||
| client-side data caching) cannot be | ||||
| done reliably. Note that if GETATTR does not return the fileid | ||||
| attribute for both filehandles, it will return it for neither of | ||||
| the filehandles, since the fsid for both filehandles is the same. | ||||
| </li> | ||||
| <li> | ||||
| If GETATTR directed to the two filehandles returns different values | ||||
| for the fileid attribute, then they are distinct objects. | ||||
| </li> | ||||
| <li> | ||||
| Otherwise, they are the same object. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="open_delegation" numbered="true" toc="default"> | ||||
| <name>Open Delegation</name> | ||||
| <t> | ||||
| When a file is being OPENed, the server may delegate further handling | ||||
| of opens and closes for that file to the opening client. Any such | ||||
| delegation is recallable since the circumstances that allowed for the | ||||
| delegation are subject to change. In particular, if the server | ||||
| receives a conflicting OPEN from another client, the server must recall | ||||
| the delegation before deciding whether the OPEN from the other client | ||||
| may be granted. Making a delegation is up to the server, and clients | ||||
| should not assume that any particular OPEN either will or will not | ||||
| result in an OPEN delegation. The following is a typical set of | ||||
| conditions that servers might use in deciding whether an OPEN should be | ||||
| delegated: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The client must be able to respond to the | ||||
| server's callback requests. If a backchannel | ||||
| has been established, the server will send | ||||
| a CB_COMPOUND request, containing a single | ||||
| operation, CB_SEQUENCE, for a test of backchannel | ||||
| availability. | ||||
| </li> | ||||
| <li> | ||||
| The client must have responded properly to previous recalls. | ||||
| </li> | ||||
| <li> | ||||
| There must be no current OPEN conflicting with the requested | ||||
| delegation. | ||||
| </li> | ||||
| <li> | ||||
| There should be no current delegation that conflicts with the | ||||
| delegation being requested. | ||||
| </li> | ||||
| <li> | ||||
| The probability of future conflicting open requests should be | ||||
| low based on the recent history of the file. | ||||
| </li> | ||||
| <li> | ||||
| The existence of any server-specific semantics of OPEN/CLOSE | ||||
| that would make the required handling incompatible with the | ||||
| prescribed handling that the delegated client would apply | ||||
| (see below). | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| There are two types of OPEN delegations: OPEN_DELEGATE_READ and OPEN_DELEGATE_WRITE. An OPEN_DELEGATE_READ | ||||
| delegation allows a client to handle, on its own, requests to open a | ||||
| file for reading that do not deny OPEN4_SHARE_ACCESS_READ access to others. Multiple | ||||
| OPEN_DELEGATE_READ delegations may be outstanding simultaneously and do not | ||||
| conflict. An OPEN_DELEGATE_WRITE delegation allows the client to handle, on its | ||||
| own, all opens. Only OPEN_DELEGATE_WRITE delegation may exist for a given | ||||
| file at a given time, and it is inconsistent with any OPEN_DELEGATE_READ delegations. | ||||
| </t> | ||||
| <t> | ||||
| When a client has an OPEN_DELEGATE_READ delegation, it is assured that | ||||
| neither the contents, the attributes (with the exception of | ||||
| time_access), nor the names of any | ||||
| links to the file will change without its knowledge, so long as the | ||||
| delegation is held. When a client has an OPEN_DELEGATE_WRITE delegation, it | ||||
| may modify the file data locally since no other client will be | ||||
| accessing the file's data. The client holding an OPEN_DELEGATE_WRITE delegation | ||||
| may only locally affect file attributes that are intimately | ||||
| connected with the file data: size, change, time_access, | ||||
| time_metadata, and time_modify. | ||||
| All other attributes must be reflected on the server. | ||||
| </t> | ||||
| <t> | ||||
| When a client has an OPEN delegation, it does not need to send OPENs or | ||||
| CLOSEs to the server. Instead, the client may update the | ||||
| appropriate status internally. For an OPEN_DELEGATE_READ delegation, opens | ||||
| that cannot be handled locally (opens that are for OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH or that | ||||
| deny OPEN4_SHARE_ACCESS_READ access) must be sent to the server. | ||||
| </t> | ||||
| <t> | ||||
| When an OPEN delegation is made, the reply to the OPEN contains an | ||||
| OPEN delegation structure that specifies the following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| the type of delegation (OPEN_DELEGATE_READ or OPEN_DELEGATE_WRITE). | ||||
| </li> | ||||
| <li> | ||||
| space limitation information to control flushing of data on close | ||||
| (OPEN_DELEGATE_WRITE delegation only; | ||||
| see <xref target="open_delegation_caching" format="default"/>) | ||||
| </li> | ||||
| <li> | ||||
| an nfsace4 specifying read and write permissions | ||||
| </li> | ||||
| <li> | ||||
| a stateid to represent the delegation | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The delegation stateid is separate and distinct from the stateid for | ||||
| the OPEN proper. The standard stateid, unlike the delegation stateid, | ||||
| is associated with a particular lock-owner and will continue to be | ||||
| valid after the delegation is recalled and the file remains open. | ||||
| </t> | ||||
| <t> | ||||
| When a request internal to the client is made to open a file and an OPEN | ||||
| delegation is in effect, it will be accepted or rejected solely on the | ||||
| basis of the following conditions. Any requirement for other checks | ||||
| to be made by the delegate should result in the OPEN delegation being | ||||
| denied so that the checks can be made by the server itself. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The access and deny bits for the request and the file as | ||||
| described in <xref target="share_reserve" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| The read and write permissions as determined below. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The nfsace4 passed with delegation can be used to avoid frequent | ||||
| ACCESS calls. The permission check should be as follows: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the nfsace4 indicates that the open may be done, then it should be | ||||
| granted without reference to the server. | ||||
| </li> | ||||
| <li> | ||||
| If the nfsace4 indicates that the open may not be done, then an ACCESS | ||||
| request must be sent to the server to obtain the definitive answer. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The server may return an nfsace4 that is more restrictive than the | ||||
| actual ACL of the file. This includes an nfsace4 that specifies | ||||
| denial of all access. Note that some common practices such as mapping | ||||
| the traditional user "root" to the user "nobody" (see <xref target="owner_owner_group" format="default"/>) may make it incorrect | ||||
| to return the actual ACL of the file in the delegation response. | ||||
| </t> | ||||
| <t> | ||||
| The use of a delegation together with various other forms of caching | ||||
| creates the possibility that no server authentication and authorization | ||||
| will ever be | ||||
| performed for a given user since all of the user's requests might be | ||||
| satisfied locally. Where the client is depending on the server for | ||||
| authentication and authorization, the client should be sure authentication and authorization occurs for | ||||
| each user by use of the ACCESS operation. This should be the case | ||||
| even if an ACCESS operation would not be required otherwise. As | ||||
| mentioned before, the server may enforce frequent authentication by | ||||
| returning an nfsace4 denying all access with every OPEN delegation. | ||||
| </t> | ||||
| <section anchor="open_delegation_caching" numbered="true" toc="default"> | ||||
| <name>Open Delegation and Data Caching</name> | ||||
| <t> | ||||
| An OPEN delegation allows much of the message overhead associated with | ||||
| the opening and closing files to be eliminated. An open when an OPEN | ||||
| delegation is in effect does not require that a validation | ||||
| message be sent to the server. The continued endurance of the | ||||
| "OPEN_DELEGATE_READ delegation" provides a guarantee that no OPEN | ||||
| for OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH, and thus | ||||
| no write, has occurred. Similarly, when closing a file opened | ||||
| for OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH and if an OPEN_DELEGATE_WRITE delegation is in effect, | ||||
| the data written does not have to be written to the server until | ||||
| the OPEN delegation is recalled. The continued endurance of | ||||
| the OPEN delegation provides a | ||||
| guarantee that no open, and thus no READ or WRITE, has been done by | ||||
| another client. | ||||
| </t> | ||||
| <t> | ||||
| For the purposes of OPEN delegation, READs and WRITEs done without an | ||||
| OPEN are treated as the functional equivalents of a corresponding type | ||||
| of OPEN. Although a client <bcp14>SHOULD NOT</bcp14> use special stateids when | ||||
| an open exists, delegation handling on the server can use the | ||||
| client ID associated with the current session to determine if the | ||||
| operation has been done by the holder of the delegation (in which | ||||
| case, no recall is necessary) or by another client (in which case, | ||||
| the delegation must be recalled and I/O not proceed until the | ||||
| delegation is returned or revoked). | ||||
| </t> | ||||
| <t> | ||||
| With delegations, a client is able to avoid writing data to the server | ||||
| when the CLOSE of a file is serviced. The file close system call is | ||||
| the usual point at which the client is notified of a lack of stable | ||||
| storage for the modified file data generated by the application. At | ||||
| the close, file data is written to the server and, through normal | ||||
| accounting, the server is able to determine if the available file system | ||||
| space for the data has been exceeded (i.e., the server returns | ||||
| NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting includes quotas. | ||||
| The introduction of delegations requires that an alternative method be | ||||
| in place for the same type of communication to occur between client | ||||
| and server. | ||||
| </t> | ||||
| <t> | ||||
| In the delegation response, the server provides either the limit of | ||||
| the size of the file or the number of modified blocks and associated | ||||
| block size. The server must ensure that the client will be able to | ||||
| write modified data to the server of a size equal to that provided in the | ||||
| original delegation. The server must make this assurance for all | ||||
| outstanding delegations. Therefore, the server must be careful in its | ||||
| management of available space for new or modified data, taking into | ||||
| account available file system space and any applicable quotas. The | ||||
| server can recall delegations as a result of managing the available | ||||
| file system space. The client should abide by the server's state | ||||
| space limits for delegations. If the client exceeds the stated limits | ||||
| for the delegation, the server's behavior is undefined. | ||||
| </t> | ||||
| <t> | ||||
| Based on server conditions, quotas, or available file system space, the | ||||
| server may grant OPEN_DELEGATE_WRITE delegations with very restrictive space | ||||
| limitations. The limitations may be defined in a way that will always | ||||
| force modified data to be flushed to the server on close. | ||||
| </t> | ||||
| <t> | ||||
| With respect to authentication, flushing modified data to the server | ||||
| after a CLOSE has occurred may be problematic. For example, the user | ||||
| of the application may have logged off the client, and unexpired | ||||
| authentication credentials may not be present. In this case, the | ||||
| client may need to take special care to ensure that local unexpired | ||||
| credentials will in fact be available. This may be accomplished by | ||||
| tracking the expiration time of credentials and flushing data well in | ||||
| advance of their expiration or by making private copies of credentials | ||||
| to assure their availability when needed. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Open Delegation and File Locks</name> | ||||
| <t> | ||||
| When a client holds an OPEN_DELEGATE_WRITE delegation, lock operations are | ||||
| performed locally. This includes those required for mandatory byte-range | ||||
| locking. This can be done since the delegation implies that there can | ||||
| be no conflicting locks. Similarly, all of the revalidations that | ||||
| would normally be associated with obtaining locks and the flushing of | ||||
| data associated with the releasing of locks need not be done. | ||||
| </t> | ||||
| <t> | ||||
| When a client holds an OPEN_DELEGATE_READ delegation, lock operations are not | ||||
| performed locally. All lock operations, including those requesting | ||||
| non-exclusive locks, are sent to the server for resolution. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="handling_cb_getattr" numbered="true" toc="default"> | ||||
| <name>Handling of CB_GETATTR</name> | ||||
| <t> | ||||
| The server needs to employ special handling for a GETATTR where the | ||||
| target is a file that has an OPEN_DELEGATE_WRITE delegation in effect. The | ||||
| reason for this is that the client holding the OPEN_DELEGATE_WRITE delegation may | ||||
| have modified the data, and the server needs to reflect this change to | ||||
| the second client that submitted the GETATTR. Therefore, the client | ||||
| holding the OPEN_DELEGATE_WRITE delegation needs to be interrogated. The server | ||||
| will use the CB_GETATTR operation. The only attributes that the | ||||
| server can reliably query via CB_GETATTR are size and change. | ||||
| </t> | ||||
| <t> | ||||
| Since CB_GETATTR is being used to satisfy another client's GETATTR | ||||
| request, the server only needs to know if the client holding the | ||||
| delegation has a modified version of the file. If the client's copy | ||||
| of the delegated file is not modified (data or size), the server can | ||||
| satisfy the second client's GETATTR request from the attributes stored | ||||
| locally at the server. If the file is modified, the server only needs | ||||
| to know about this modified state. If the server determines that the | ||||
| file is currently modified, it will respond to the second client's | ||||
| GETATTR as if the file had been modified locally at the server. | ||||
| </t> | ||||
| <t> | ||||
| Since the form of the change attribute is determined by the server and | ||||
| is opaque to the client, the client and server need to agree on a | ||||
| method of communicating the modified state of the file. For the size | ||||
| attribute, the client will report its current view of the file size. | ||||
| For the change attribute, the handling is more involved. | ||||
| </t> | ||||
| <t> | ||||
| For the client, the following steps will be taken when receiving an | ||||
| OPEN_DELEGATE_WRITE delegation: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The value of the change attribute will be obtained from the server and | ||||
| cached. Let this value be represented by c. | ||||
| </li> | ||||
| <li> | ||||
| The client will create a value greater than c that will be used for | ||||
| communicating that modified data is held at the client. Let this value be | ||||
| represented by d. | ||||
| </li> | ||||
| <li> | ||||
| When the client is queried via CB_GETATTR for the change attribute, it | ||||
| checks to see if it holds modified data. If the file is modified, the | ||||
| value d is returned for the change attribute value. If this file is | ||||
| not currently modified, the client returns the value c for the change | ||||
| attribute. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| For simplicity of implementation, the client <bcp14>MAY</bcp14> for each CB_GETATTR | ||||
| return the same value d. This is true even if, between successive | ||||
| CB_GETATTR operations, the client again modifies the file's data or | ||||
| metadata in its cache. The client can return the same value because | ||||
| the only requirement is that the client be able to indicate to the | ||||
| server that the client holds modified data. Therefore, the value of d | ||||
| may always be c + 1. | ||||
| </t> | ||||
| <t> | ||||
| While the change attribute is opaque to the client in the sense that | ||||
| it has no idea what units of time, if any, the server is counting | ||||
| change with, it is not opaque in that the client has to treat it as an | ||||
| unsigned integer, and the server has to be able to see the results of | ||||
| the client's changes to that integer. Therefore, the server <bcp14>MUST</bcp14> | ||||
| encode the change attribute in network order when sending it to the | ||||
| client. The client <bcp14>MUST</bcp14> decode it from network order to its native | ||||
| order when receiving it, and the client <bcp14>MUST</bcp14> encode it in network order | ||||
| when sending it to the server. For this reason, change is defined as | ||||
| an unsigned integer rather than an opaque array of bytes. | ||||
| </t> | ||||
| <t> | ||||
| For the server, the following steps will be taken when providing an | ||||
| OPEN_DELEGATE_WRITE delegation: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Upon providing an OPEN_DELEGATE_WRITE delegation, the server will cache a copy of the | ||||
| change attribute in the data structure it uses to record the | ||||
| delegation. Let this value be represented by sc. | ||||
| </li> | ||||
| <li> | ||||
| When a second client sends a GETATTR operation on the same file to the | ||||
| server, the server obtains the change attribute from the first client. | ||||
| Let this value be cc. | ||||
| </li> | ||||
| <li> | ||||
| If the value cc is equal to sc, the file is not modified and the | ||||
| server returns the current values for change, time_metadata, and | ||||
| time_modify (for example) to the second client. | ||||
| </li> | ||||
| <li> | ||||
| If the value cc is NOT equal to sc, the file is currently modified at | ||||
| the first client and most likely will be modified at the server at a | ||||
| future time. The server then uses its current time to construct | ||||
| attribute values for time_metadata and time_modify. A new value of | ||||
| sc, which we will call nsc, is computed by the server, such that nsc | ||||
| >= sc + 1. The server then returns the constructed time_metadata, | ||||
| time_modify, and nsc values to the requester. The server replaces sc | ||||
| in the delegation record with nsc. To prevent the possibility of | ||||
| time_modify, time_metadata, and change from appearing to go backward | ||||
| (which would happen if the client holding the delegation fails to | ||||
| write its modified data to the server before the delegation is revoked | ||||
| or returned), the server <bcp14>SHOULD</bcp14> update the file's metadata record with | ||||
| the constructed attribute values. For reasons of reasonable | ||||
| performance, committing the constructed attribute values to stable | ||||
| storage is <bcp14>OPTIONAL</bcp14>. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| As discussed earlier in this section, the client <bcp14>MAY</bcp14> return the same | ||||
| cc value on subsequent CB_GETATTR calls, even if the file was modified | ||||
| in the client's cache yet again between successive CB_GETATTR calls. | ||||
| Therefore, the server must assume that the file has been modified yet | ||||
| again, and <bcp14>MUST</bcp14> take care to ensure that the new nsc it constructs and | ||||
| returns is greater than the previous nsc it returned. An example | ||||
| implementation's delegation record would satisfy this mandate by | ||||
| including a boolean field (let us call it "modified") that is set to | ||||
| FALSE when the delegation is granted, and an sc value set at the time | ||||
| of grant to the change attribute value. The modified field would be | ||||
| set to TRUE the first time cc != sc, and would stay TRUE until the | ||||
| delegation is returned or revoked. The processing for constructing | ||||
| nsc, time_modify, and time_metadata would use this pseudo code: | ||||
| </t> | ||||
| <sourcecode type="pseudocode"><![CDATA[ | ||||
| if (!modified) { | ||||
| do CB_GETATTR for change and size; | ||||
| if (cc != sc) | ||||
| modified = TRUE; | ||||
| } else { | ||||
| do CB_GETATTR for size; | ||||
| } | ||||
| if (modified) { | ||||
| sc = sc + 1; | ||||
| time_modify = time_metadata = current_time; | ||||
| update sc, time_modify, time_metadata into file's metadata; | ||||
| }]]></sourcecode> | ||||
| <t> | ||||
| This would return to the client (that sent GETATTR) the attributes | ||||
| it requested, but make sure size comes from what | ||||
| CB_GETATTR returned. The server would not update the file's | ||||
| metadata with the client's modified size. | ||||
| </t> | ||||
| <t> | ||||
| In the case that the file attribute size is different than the | ||||
| server's current value, the server treats this as a modification | ||||
| regardless of the value of the change attribute retrieved via | ||||
| CB_GETATTR and responds to the second client as in the last step. | ||||
| </t> | ||||
| <t> | ||||
| This methodology resolves issues of clock differences between client | ||||
| and server and other scenarios where the use of CB_GETATTR break down. | ||||
| </t> | ||||
| <t> | ||||
| It should be noted that the server is under no obligation to use | ||||
| CB_GETATTR, and therefore the server <bcp14>MAY</bcp14> simply recall the delegation | ||||
| to avoid its use. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Recall of Open Delegation</name> | ||||
| <t> | ||||
| The following events necessitate recall of an OPEN delegation: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| potentially conflicting OPEN request (or a READ or WRITE operation | ||||
| done with a special stateid) | ||||
| </li> | ||||
| <li> | ||||
| SETATTR sent by another client | ||||
| </li> | ||||
| <li> | ||||
| REMOVE request for the file | ||||
| </li> | ||||
| <li> | ||||
| RENAME request for the file as either the source or target of the RENAME | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Whether a RENAME of a directory in the path leading to the file | ||||
| results in recall of an OPEN delegation depends on the semantics of | ||||
| the server's file system. If that file system denies such RENAMEs when | ||||
| a file is open, the recall must be performed to determine whether the | ||||
| file in question is, in fact, open. | ||||
| </t> | ||||
| <t> | ||||
| In addition to the situations above, the server may choose to recall | ||||
| OPEN delegations at any time if resource constraints make it advisable | ||||
| to do so. Clients should always be prepared for the possibility of | ||||
| recall. | ||||
| </t> | ||||
| <t> | ||||
| When a client receives a recall for an OPEN delegation, it needs | ||||
| to update state on the server before returning the delegation. | ||||
| These same updates must be done whenever a client chooses to | ||||
| return a delegation voluntarily. The following items of state | ||||
| need to be dealt with: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the file associated with the delegation is no longer open and no | ||||
| previous CLOSE operation has been sent to the server, a CLOSE | ||||
| operation must be sent to the server. | ||||
| </li> | ||||
| <li> | ||||
| If a file has other open references at the client, then OPEN | ||||
| operations must be sent to the server. The appropriate stateids will | ||||
| be provided by the server for subsequent use by the client since the | ||||
| delegation stateid will no longer be valid. These OPEN requests are | ||||
| done with the claim type of CLAIM_DELEGATE_CUR. This will allow the | ||||
| presentation of the delegation stateid so that the client can | ||||
| establish the appropriate rights to perform the OPEN. (see | ||||
| <xref target="OP_OPEN" format="default"/>, which describes the OPEN operation, | ||||
| for details.) | ||||
| </li> | ||||
| <li> | ||||
| If there are granted byte-range locks, the corresponding LOCK operations | ||||
| need to be performed. This applies to the OPEN_DELEGATE_WRITE delegation case | ||||
| only. | ||||
| </li> | ||||
| <li> | ||||
| For an OPEN_DELEGATE_WRITE delegation, if | ||||
| at the time of recall the file is not open for | ||||
| OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH, all modified | ||||
| data for the file must be flushed to the | ||||
| server. If the delegation had not existed, the client would have done | ||||
| this data flush before the CLOSE operation. | ||||
| </li> | ||||
| <li> | ||||
| For an OPEN_DELEGATE_WRITE delegation when a file is still open at the time of | ||||
| recall, any modified data for the file needs to be flushed to the | ||||
| server. | ||||
| </li> | ||||
| <li> | ||||
| With the OPEN_DELEGATE_WRITE delegation in place, it is possible that the file | ||||
| was truncated during the duration of the delegation. For example, the | ||||
| truncation could have occurred as a result of an OPEN UNCHECKED with a | ||||
| size attribute value of zero. Therefore, if a truncation of | ||||
| the file has occurred and this operation has not been propagated to | ||||
| the server, the truncation must occur before any modified data is | ||||
| written to the server. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| In the case of OPEN_DELEGATE_WRITE delegation, byte-range locking imposes some | ||||
| additional requirements. To precisely maintain the associated | ||||
| invariant, it is required to flush any modified data in any byte-range for | ||||
| which a WRITE_LT lock was released while the OPEN_DELEGATE_WRITE delegation was in | ||||
| effect. However, because the OPEN_DELEGATE_WRITE delegation implies no other | ||||
| locking by other clients, a simpler implementation is to flush all | ||||
| modified data for the file (as described just above) if any WRITE_LT lock | ||||
| has been released while the OPEN_DELEGATE_WRITE delegation was in effect. | ||||
| </t> | ||||
| <t> | ||||
| An implementation need not wait until delegation recall (or | ||||
| the decision to voluntarily return a delegation) to perform any of the above | ||||
| actions, if implementation considerations (e.g., resource availability | ||||
| constraints) make that desirable. Generally, however, the fact that | ||||
| the actual OPEN state of the file may continue to change makes it not | ||||
| worthwhile to send information about opens and closes to the server, | ||||
| except as part of delegation return. An exception is | ||||
| when the client has no more internal opens of the file. In this | ||||
| case, sending a CLOSE is useful because it | ||||
| reduces resource utilization on the client | ||||
| and server. | ||||
| Regardless of the client's choices on scheduling these | ||||
| actions, all must be performed before the delegation is returned, | ||||
| including (when applicable) the close that corresponds to the OPEN | ||||
| that resulted in the delegation. These actions can be performed | ||||
| either in previous requests or in previous operations in the same | ||||
| COMPOUND request. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Clients That Fail to Honor Delegation Recalls</name> | ||||
| <t> | ||||
| A client may fail to respond to a recall for various reasons, such as | ||||
| a failure of the backchannel from server to the client. The client | ||||
| may be unaware of a failure in the backchannel. This lack of | ||||
| awareness could result in the client finding out long after the | ||||
| failure that its delegation has been revoked, and another client has | ||||
| modified the data for which the client had a delegation. This is | ||||
| especially a problem for the client that held an OPEN_DELEGATE_WRITE delegation. | ||||
| </t> | ||||
| <t> | ||||
| Status bits returned by SEQUENCE operations help to provide an | ||||
| alternate way of informing the client of issues regarding the | ||||
| status of the backchannel and of recalled delegations. When the | ||||
| backchannel is not available, the server returns the status bit | ||||
| SEQ4_STATUS_CB_PATH_DOWN on SEQUENCE operations. The client can | ||||
| react by attempting to re-establish the backchannel and by | ||||
| returning recallable objects if a backchannel cannot be successfully | ||||
| re-established. | ||||
| </t> | ||||
| <t> | ||||
| Whether the backchannel is functioning or not, it may be that the | ||||
| recalled delegation is not returned. Note that the client's lease | ||||
| might still be renewed, even though the recalled delegation is not | ||||
| returned. In this situation, servers <bcp14>SHOULD</bcp14> revoke delegations that | ||||
| are not returned in a period of time equal to the lease period. This | ||||
| period of time should allow the client time to note the | ||||
| backchannel-down status and re-establish the backchannel. | ||||
| </t> | ||||
| <t> | ||||
| When delegations are revoked, the server will return with the | ||||
| SEQ4_STATUS_RECALLABLE_STATE_REVOKED status bit set on subsequent | ||||
| SEQUENCE operations. The client should note this and then use | ||||
| TEST_STATEID to find which delegations have been revoked. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Delegation Revocation</name> | ||||
| <t> | ||||
| At the point a delegation is revoked, if there are associated opens | ||||
| on the client, these opens may or may not be revoked. If no | ||||
| byte-range lock or open is granted that is inconsistent with the existing open, | ||||
| the stateid for the open may remain valid and be disconnected | ||||
| from the revoked delegation, just as would be the case if the | ||||
| delegation were returned. | ||||
| </t> | ||||
| <t> | ||||
| For example, if an OPEN for OPEN4_SHARE_ACCESS_BOTH with a deny of OPEN4_SHARE_DENY_NONE is | ||||
| associated with the delegation, granting of another such OPEN | ||||
| to a different client will revoke the delegation but need not | ||||
| revoke the OPEN, since the two OPENs are consistent with each other. | ||||
| On the other hand, if an OPEN denying write access is | ||||
| granted, then the existing OPEN must be revoked. | ||||
| </t> | ||||
| <t> | ||||
| When opens and/or locks are revoked, | ||||
| the applications holding these opens or locks need to be notified. | ||||
| This notification usually occurs by returning errors for READ/WRITE | ||||
| operations or when a close is attempted for the open file. | ||||
| </t> | ||||
| <t> | ||||
| If no opens exist for the file at the point the delegation is revoked, | ||||
| then notification of the revocation is unnecessary. However, if there | ||||
| is modified data present at the client for the file, the user of the | ||||
| application should be notified. Unfortunately, it may not be possible | ||||
| to notify the user since active applications may not be present at the | ||||
| client. See <xref target="revocation_recovery_write" format="default"/> | ||||
| for additional details. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="via_want_delegation" numbered="true" toc="default"> | ||||
| <name>Delegations via WANT_DELEGATION</name> | ||||
| <t> | ||||
| In addition to providing delegations as part of the reply | ||||
| to OPEN operations, servers <bcp14>MAY</bcp14> provide delegations | ||||
| separate from open, via the <bcp14>OPTIONAL</bcp14> WANT_DELEGATION operation. This | ||||
| allows delegations to be obtained in advance of an OPEN that | ||||
| might benefit from them, for objects that are not a valid target | ||||
| of OPEN, or to deal with cases in which a | ||||
| delegation has been recalled and the client wants to make | ||||
| an attempt to re-establish it if the absence of use by other | ||||
| clients allows that. | ||||
| </t> | ||||
| <t> | ||||
| The WANT_DELEGATION operation may be performed on any type of | ||||
| file object other than a directory. | ||||
| </t> | ||||
| <t> | ||||
| When a delegation is obtained using WANT_DELEGATION, any open | ||||
| files for the same filehandle held by that client are to be | ||||
| treated as subordinate to the delegation, just as if they had | ||||
| been created using an OPEN of type CLAIM_DELEGATE_CUR. They are | ||||
| otherwise unchanged as to seqid, access and deny modes, and the | ||||
| relationship with byte-range locks. Similarly, because | ||||
| existing byte-range | ||||
| locks are subordinate to an open, those byte-range locks also become | ||||
| indirectly subordinate to that new delegation. | ||||
| </t> | ||||
| <t> | ||||
| The WANT_DELEGATION operation provides for delivery of delegations | ||||
| via callbacks, when the delegations are not immediately available. | ||||
| When a requested delegation is available, it is delivered to the | ||||
| client via a CB_PUSH_DELEG operation. When this happens, open files | ||||
| for the same filehandle become subordinate to the new delegation | ||||
| at the point at which the delegation is delivered, just as if they had | ||||
| been created using an OPEN of type CLAIM_DELEGATE_CUR. | ||||
| Similarly, this occurs for existing byte-range locks subordinate to an open. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="data_caching_revocation" numbered="true" toc="default"> | ||||
| <name>Data Caching and Revocation</name> | ||||
| <t> | ||||
| When locks and delegations are revoked, the assumptions upon which | ||||
| successful caching depends are no longer guaranteed. For any locks or | ||||
| share reservations that have been revoked, the corresponding state-owner | ||||
| needs to be notified. This notification includes applications with a | ||||
| file open that has a corresponding delegation that has been revoked. | ||||
| Cached data associated with the revocation must be removed from the | ||||
| client. In the case of modified data existing in the client's cache, | ||||
| that data must be removed from the client without being written to | ||||
| the server. As mentioned, the assumptions made by the client are no | ||||
| longer valid at the point when a lock or delegation has been revoked. | ||||
| For example, another client may have been granted a conflicting byte-range lock | ||||
| after the revocation of the byte-range lock at the first client. Therefore, the | ||||
| data within the lock range may have been modified by the other client. | ||||
| Obviously, the first client is unable to guarantee to the application | ||||
| what has occurred to the file in the case of revocation. | ||||
| </t> | ||||
| <t> | ||||
| Notification to a state-owner will in many cases consist of simply | ||||
| returning an error on the next and all subsequent READs/WRITEs to the | ||||
| open file or on the close. Where the methods available to a client | ||||
| make such notification impossible because errors for certain | ||||
| operations may not be returned, more drastic action such as signals or | ||||
| process termination may be appropriate. The justification here is | ||||
| that an invariant on which an application depends may be violated. | ||||
| Depending on how errors are typically treated for the client-operating | ||||
| environment, further levels of notification including logging, console | ||||
| messages, and GUI pop-ups may be appropriate. | ||||
| </t> | ||||
| <section anchor="revocation_recovery_write" numbered="true" toc="default"> | ||||
| <name>Revocation Recovery for Write Open Delegation</name> | ||||
| <t> | ||||
| Revocation recovery for an OPEN_DELEGATE_WRITE delegation poses the special | ||||
| issue of modified data in the client cache while the file is not open. | ||||
| In this situation, any client that does not flush modified data to | ||||
| the server on each close must ensure that the user receives | ||||
| appropriate notification of the failure as a result of the revocation. | ||||
| Since such situations may require human action to correct problems, | ||||
| notification schemes in which the appropriate user or administrator is | ||||
| notified may be necessary. Logging and console messages are typical | ||||
| examples. | ||||
| </t> | ||||
| <t> | ||||
| If there is modified data on the client, it must not be flushed | ||||
| normally to the server. A client may attempt to provide a copy of the | ||||
| file data as modified during the delegation under a different name in | ||||
| the file system namespace to ease recovery. Note that when the | ||||
| client can determine that the file has not been modified by any other | ||||
| client, or when the client has a complete cached copy of the file in | ||||
| question, such a saved copy of the client's view of the file may be of | ||||
| particular value for recovery. In another case, recovery using a copy | ||||
| of the file based partially on the client's cached data and partially | ||||
| on the server's copy as modified by other clients will be anything but | ||||
| straightforward, so clients may avoid saving file contents in these | ||||
| situations or specially mark the results to warn users of possible | ||||
| problems. | ||||
| </t> | ||||
| <t> | ||||
| Saving of such modified data in delegation revocation situations | ||||
| may be limited to files of a certain size or might be used only when | ||||
| sufficient disk space is available within the target file system. | ||||
| Such saving may also be restricted to situations when the client has | ||||
| sufficient buffering resources to keep the cached copy available | ||||
| until it is properly stored to the target file system. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Attribute Caching</name> | ||||
| <t> | ||||
| This section pertains to the caching of a file's attributes on a client | ||||
| when that client does not hold a delegation on the file. | ||||
| </t> | ||||
| <t> | ||||
| The attributes discussed in this section do not include named | ||||
| attributes. Individual named attributes are analogous to files, and | ||||
| caching of the data for these needs to be handled just as data caching | ||||
| is for ordinary files. Similarly, LOOKUP results from an OPENATTR | ||||
| directory (as well as the directory's contents) are to be cached on | ||||
| the same basis as any other pathnames. | ||||
| </t> | ||||
| <t> | ||||
| Clients may cache file attributes obtained from the server and use | ||||
| them to avoid subsequent GETATTR requests. Such caching is write | ||||
| through in that modification to file attributes is always done by | ||||
| means of requests to the server and should not be done locally and | ||||
| should not be cached. The exception to this are modifications to attributes that | ||||
| are intimately connected with data caching. Therefore, extending a | ||||
| file by writing data to the local data cache is reflected immediately | ||||
| in the size as seen on the client without this change being | ||||
| immediately reflected on the server. Normally, such changes are not | ||||
| propagated directly to the server, but when the modified data is | ||||
| flushed to the server, analogous attribute changes are made on the | ||||
| server. When OPEN delegation is in effect, the modified attributes | ||||
| may be returned to the server in reaction to a CB_RECALL call. | ||||
| </t> | ||||
| <t> | ||||
| The result of local caching of attributes is that the attribute | ||||
| caches maintained on individual clients will not be coherent. | ||||
| Changes made in one order on the server may be seen in a different | ||||
| order on one client and in a third order on another client. | ||||
| </t> | ||||
| <t> | ||||
| The typical file system application programming interfaces do not | ||||
| provide means to atomically modify or interrogate attributes for | ||||
| multiple files at the same time. The following rules provide an | ||||
| environment where the potential incoherencies mentioned above can be | ||||
| reasonably managed. These rules are derived from the practice of | ||||
| previous NFS protocols. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| All attributes for a given file (per-fsid attributes excepted) are | ||||
| cached as a unit at the client so that no non-serializability can | ||||
| arise within the context of a single file. | ||||
| </li> | ||||
| <li> | ||||
| An upper time boundary is maintained on how long a client cache entry | ||||
| can be kept without being refreshed from the server. | ||||
| </li> | ||||
| <li> | ||||
| When operations are performed that change attributes at the server, | ||||
| the updated attribute set is requested as part of the containing RPC. | ||||
| This includes directory operations that update attributes indirectly. | ||||
| This is accomplished by following the modifying operation with a | ||||
| GETATTR operation and then using the results of the GETATTR to update | ||||
| the client's cached attributes. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Note that if the full set of attributes to be cached is requested by | ||||
| READDIR, the results can be cached by the client on the same basis as | ||||
| attributes obtained via GETATTR. | ||||
| </t> | ||||
| <t> | ||||
| A client may validate its cached version of attributes for a file by | ||||
| fetching both the change and time_access attributes and assuming | ||||
| that if the change attribute has the same value as it did when the | ||||
| attributes were cached, then no attributes other than time_access have | ||||
| changed. The reason why time_access is also fetched is because many | ||||
| servers operate in environments where the operation that updates | ||||
| change does not update time_access. For example, POSIX file semantics | ||||
| do not update access time when a file is modified by the write system | ||||
| call <xref target="write_atime" format="default"/>. Therefore, the client that wants a current time_access value | ||||
| should fetch it with change during the attribute cache validation | ||||
| processing and update its cached time_access. | ||||
| </t> | ||||
| <t> | ||||
| The client may maintain a cache of modified attributes for those | ||||
| attributes intimately connected with data of modified regular files | ||||
| (size, time_modify, and change). Other than those three attributes, | ||||
| the client <bcp14>MUST NOT</bcp14> maintain a cache of modified attributes. Instead, | ||||
| attribute changes are immediately sent to the server. | ||||
| </t> | ||||
| <t> | ||||
| In some operating environments, the equivalent to time_access is | ||||
| expected to be implicitly updated by each read of the content of the | ||||
| file object. If an NFS client is caching the content of a file | ||||
| object, whether it is a regular file, directory, or symbolic link, the | ||||
| client <bcp14>SHOULD NOT</bcp14> update the time_access attribute (via SETATTR or a | ||||
| small READ or READDIR request) on the server with each read that is | ||||
| satisfied from cache. The reason is that this can defeat the | ||||
| performance benefits of caching content, especially since an explicit | ||||
| SETATTR of time_access may alter the change attribute on the server. | ||||
| If the change attribute changes, clients that are caching the content | ||||
| will think the content has changed, and will re-read unmodified data | ||||
| from the server. Nor is the client encouraged to maintain a modified | ||||
| version of time_access in its cache, since the client either would | ||||
| eventually have to write the access time to the server | ||||
| with bad performance effects or never update the | ||||
| server's time_access, thereby resulting in a situation where an | ||||
| application that caches access time between a close and open of | ||||
| the same file observes the access time oscillating between the past and | ||||
| present. The time_access attribute always means the time of last | ||||
| access to a file by a read that was satisfied by the server. This way | ||||
| clients will tend to see only time_access changes that go forward in | ||||
| time. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Data and Metadata Caching and Memory Mapped Files</name> | ||||
| <t> | ||||
| Some operating environments include the capability for an application | ||||
| to map a file's content into the application's address space. Each | ||||
| time the application accesses a memory location that corresponds to a | ||||
| block that has not been loaded into the address space, a page fault | ||||
| occurs and the file is read (or if the block does not exist in the | ||||
| file, the block is allocated and then instantiated in the | ||||
| application's address space). | ||||
| </t> | ||||
| <t> | ||||
| As long as each memory-mapped access to the file requires a page | ||||
| fault, the relevant attributes of the file that are used to detect | ||||
| access and modification (time_access, time_metadata, time_modify, and | ||||
| change) will be updated. However, in many operating environments, | ||||
| when page faults are not required, these attributes will not be updated | ||||
| on reads or updates to the file via memory access (regardless of | ||||
| whether the file is local or is accessed remotely). A client or | ||||
| server <bcp14>MAY</bcp14> fail to update attributes of a file that is being accessed | ||||
| via memory-mapped I/O. This has several implications: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If there is an application on the server that has memory mapped a file | ||||
| that a client is also accessing, the client may not be able to get a | ||||
| consistent value of the change attribute to determine | ||||
| whether or not its cache is stale. A server that knows that | ||||
| the file is memory-mapped could always pessimistically | ||||
| return updated values for change so as to force the | ||||
| application to always get the most up-to-date data | ||||
| and metadata for the file. However, due to the negative performance | ||||
| implications of this, such behavior is <bcp14>OPTIONAL</bcp14>. | ||||
| </li> | ||||
| <li> | ||||
| If the memory-mapped file is not being modified on the server, and | ||||
| instead is just being read by an application via the memory-mapped | ||||
| interface, the client will not see an updated time_access attribute. | ||||
| However, in many operating environments, neither will any process | ||||
| running on the server. Thus, NFS clients are at no disadvantage with | ||||
| respect to local processes. | ||||
| </li> | ||||
| <li> | ||||
| If there is another client that is memory mapping the file, and if | ||||
| that client is holding an OPEN_DELEGATE_WRITE delegation, the same set of issues as | ||||
| discussed in the previous two bullet points apply. So, when a server | ||||
| does a CB_GETATTR to a file that the client has modified in its cache, | ||||
| the reply from CB_GETATTR will not necessarily be accurate. As | ||||
| discussed earlier, the client's obligation is to report that the file | ||||
| has been modified since the delegation was granted, not whether it has | ||||
| been modified again between successive CB_GETATTR calls, and the | ||||
| server <bcp14>MUST</bcp14> assume that any file the client has modified in cache has | ||||
| been modified again between successive CB_GETATTR calls. Depending on | ||||
| the nature of the client's memory management system, this weak | ||||
| obligation may not be possible. A client <bcp14>MAY</bcp14> return stale information | ||||
| in CB_GETATTR whenever the file is memory-mapped. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The mixture of memory mapping and byte-range locking on the same file is | ||||
| problematic. Consider the following scenario, where a page size on | ||||
| each client is 8192 bytes. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Client A memory maps the first page (8192 bytes) of file X. | ||||
| </li> | ||||
| <li> | ||||
| Client B memory maps the first page (8192 bytes) of file X. | ||||
| </li> | ||||
| <li> | ||||
| Client A WRITE_LT locks the first 4096 bytes. | ||||
| </li> | ||||
| <li> | ||||
| Client B WRITE_LT locks the second 4096 bytes. | ||||
| </li> | ||||
| <li> | ||||
| Client A, via a STORE instruction, modifies part of its locked byte-range. | ||||
| </li> | ||||
| <li> | ||||
| Simultaneous to client A, client B executes a STORE on part of its | ||||
| locked byte-range. | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Here the challenge is for each client to resynchronize to get a | ||||
| correct view of the first page. In many operating environments, the | ||||
| virtual memory management systems on each client only know a page is | ||||
| modified, not that a subset of the page corresponding to the | ||||
| respective lock byte-ranges has been modified. So it is not possible for | ||||
| each client to do the right thing, which is to write to the | ||||
| server only that portion of the page that is locked. For example, if | ||||
| client A simply writes out the page, and then client B writes out the | ||||
| page, client A's data is lost. | ||||
| </t> | ||||
| <t> | ||||
| Moreover, if mandatory locking is enabled on the file, then we have a | ||||
| different problem. When clients A and B execute the STORE instructions, | ||||
| the resulting page faults require a byte-range lock on the entire page. | ||||
| Each client then tries to extend their locked range to the entire | ||||
| page, which results in a deadlock. Communicating the NFS4ERR_DEADLOCK | ||||
| error to a STORE instruction is difficult at best. | ||||
| </t> | ||||
| <t> | ||||
| If a client is locking the entire memory-mapped file, there is no | ||||
| problem with advisory or mandatory byte-range locking, at least until the | ||||
| client unlocks a byte-range in the middle of the file. | ||||
| </t> | ||||
| <t> | ||||
| Given the above issues, the following are permitted: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Clients and servers <bcp14>MAY</bcp14> deny memory mapping a file for which they know there are | ||||
| byte-range locks. | ||||
| </li> | ||||
| <li> | ||||
| Clients and servers <bcp14>MAY</bcp14> deny a byte-range lock on a file they know is | ||||
| memory-mapped. | ||||
| </li> | ||||
| <li> | ||||
| A client <bcp14>MAY</bcp14> deny memory mapping a file that it knows requires | ||||
| mandatory locking for I/O. If mandatory locking is enabled after the | ||||
| file is opened and mapped, the client <bcp14>MAY</bcp14> deny the application further | ||||
| access to its mapped file. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="without_dir_deleg" numbered="true" toc="default"> | ||||
| <name>Name and Directory Caching without Directory Delegations</name> | ||||
| <t> | ||||
| The NFSv4.1 directory delegation facility | ||||
| (described in <xref target="dir_deleg" format="default"/> below) is <bcp14>OPTIONAL</bcp14> | ||||
| for servers to implement. Even where it is | ||||
| implemented, it may not always be functional because of resource | ||||
| availability issues or other constraints. Thus, it is | ||||
| important to understand how name and directory caching are done | ||||
| in the absence of directory delegations. These topics are | ||||
| discussed in the next two subsections. | ||||
| </t> | ||||
| <section anchor="name_caching" numbered="true" toc="default"> | ||||
| <name>Name Caching</name> | ||||
| <t> | ||||
| The results of LOOKUP and READDIR operations may be cached to avoid | ||||
| the cost of subsequent LOOKUP operations. Just as in the case of | ||||
| attribute caching, inconsistencies may arise among the various client | ||||
| caches. To mitigate the effects of these inconsistencies and given | ||||
| the context of typical file system APIs, an upper time boundary is | ||||
| maintained for how long a client name cache entry can be kept without | ||||
| verifying that the entry has not been made invalid by a directory | ||||
| change operation performed by another client. | ||||
| </t> | ||||
| <t> | ||||
| When a client is not making changes to a directory for which there | ||||
| exist name cache entries, the client needs to periodically fetch | ||||
| attributes for that directory to ensure that it is not being modified. | ||||
| After determining that no modification has occurred, the expiration | ||||
| time for the associated name cache entries may be updated to be the | ||||
| current time plus the name cache staleness bound. | ||||
| </t> | ||||
| <t> | ||||
| When a client is making changes to a given directory, it needs to | ||||
| determine whether there have been changes made to the directory by | ||||
| other clients. It does this by using the change attribute as reported | ||||
| before and after the directory operation in the associated | ||||
| change_info4 value returned for the operation. The server is able to | ||||
| communicate to the client whether the change_info4 data is provided | ||||
| atomically with respect to the directory operation. If the change | ||||
| values are provided atomically, the client has a basis for determining, | ||||
| given proper care, whether other clients are modifying the directory | ||||
| in question. | ||||
| </t> | ||||
| <t> | ||||
| The simplest way to enable the client to make this determination is | ||||
| for the client to serialize all changes made to a specific directory. | ||||
| When this is done, and the server provides before and after values of the | ||||
| change attribute atomically, the client can simply compare the | ||||
| after value of the change attribute from one operation on a | ||||
| directory with the before value on the subsequent operation | ||||
| modifying that directory. When these are equal, the client is | ||||
| assured that no other client is modifying the directory in question. | ||||
| </t> | ||||
| <t> | ||||
| When such serialization is not used, and there may be multiple | ||||
| simultaneous outstanding operations modifying a single directory sent | ||||
| from a single client, making this sort of determination can be more | ||||
| complicated. If two such operations | ||||
| complete in a different order than they were actually performed, | ||||
| that might give an appearance consistent with modification being | ||||
| made by another client. Where this appears to happen, the client | ||||
| needs to await the completion of all such modifications that were | ||||
| started previously, to see if the outstanding before and after | ||||
| change numbers can be sorted into a chain such that the before | ||||
| value of one change number matches the after value of a previous | ||||
| one, in a chain consistent with this client being the only one | ||||
| modifying the directory. | ||||
| </t> | ||||
| <t> | ||||
| In either of these cases, the client is able to determine whether | ||||
| the directory is being modified by another client. | ||||
| If the comparison indicates that the directory was updated by | ||||
| another client, the name cache associated with the modified directory | ||||
| is purged from the client. If the comparison indicates no | ||||
| modification, the name cache can be updated on the client to reflect | ||||
| the directory operation and the associated timeout can be extended. The | ||||
| post-operation change value needs to be saved as the basis for future | ||||
| change_info4 comparisons. | ||||
| </t> | ||||
| <t> | ||||
| As demonstrated by the scenario above, name caching requires that the | ||||
| client revalidate name cache data by inspecting the change attribute | ||||
| of a directory at the point when the name cache item was cached. This | ||||
| requires that the server update the change attribute for directories | ||||
| when the contents of the corresponding directory is modified. For a | ||||
| client to use the change_info4 information appropriately and | ||||
| correctly, the server must report the pre- and post-operation change | ||||
| attribute values atomically. When the server is unable to report the | ||||
| before and after values atomically with respect to the directory | ||||
| operation, the server must indicate that fact in the change_info4 | ||||
| return value. When the information is not atomically reported, the | ||||
| client should not assume that other clients have not changed the | ||||
| directory. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Directory Caching</name> | ||||
| <t> | ||||
| The results of READDIR operations may be used to avoid subsequent | ||||
| READDIR operations. Just as in the cases of attribute and name | ||||
| caching, inconsistencies may arise among the various client caches. To | ||||
| mitigate the effects of these inconsistencies, and given the context of | ||||
| typical file system APIs, the following rules should be followed: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Cached READDIR information for a directory that is not obtained in a | ||||
| single READDIR operation must always be a consistent snapshot of | ||||
| directory contents. This is determined by using a GETATTR before the | ||||
| first READDIR and after the last READDIR that contributes to the | ||||
| cache. | ||||
| </li> | ||||
| <li> | ||||
| An upper time boundary is maintained to indicate the length of time a | ||||
| directory cache entry is considered valid before the client must | ||||
| revalidate the cached information. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The revalidation technique parallels that discussed in the case of | ||||
| name caching. When the client is not changing the directory in | ||||
| question, checking the change attribute of the directory with GETATTR | ||||
| is adequate. The lifetime of the cache entry can be extended at these | ||||
| checkpoints. When a client is modifying the directory, the client | ||||
| needs to use the change_info4 data to determine whether there are | ||||
| other clients modifying the directory. If it is determined that no | ||||
| other client modifications are occurring, the client may update its | ||||
| directory cache to reflect its own changes. | ||||
| </t> | ||||
| <t> | ||||
| As demonstrated previously, directory caching requires that the client | ||||
| revalidate directory cache data by inspecting the change attribute of | ||||
| a directory at the point when the directory was cached. This requires | ||||
| that the server update the change attribute for directories when the | ||||
| contents of the corresponding directory is modified. For a client to | ||||
| use the change_info4 information appropriately and correctly, the | ||||
| server must report the pre- and post-operation change attribute values | ||||
| atomically. When the server is unable to report the before and after | ||||
| values atomically with respect to the directory operation, the server | ||||
| must indicate that fact in the change_info4 return value. When the | ||||
| information is not atomically reported, the client should not assume | ||||
| that other clients have not changed the directory. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="dir_deleg" numbered="true" toc="default"> | ||||
| <name>Directory Delegations</name> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Introduction to Directory Delegations</name> | ||||
| <t> | ||||
| Directory caching for the NFSv4.1 protocol, as previously | ||||
| described, is similar to file | ||||
| caching in previous versions. Clients typically cache | ||||
| directory information for | ||||
| a duration determined by the client. At the end of a predefined | ||||
| timeout, the client will query the server to see if the directory has | ||||
| been updated. By caching attributes, clients reduce the number of | ||||
| GETATTR calls made to the server to validate attributes. Furthermore, | ||||
| frequently accessed files and directories, such as the current | ||||
| working directory, have their attributes cached on the client so that | ||||
| some NFS operations can be performed without having to make an RPC | ||||
| call. By caching name and inode information about most recently | ||||
| looked up entries in a Directory Name Lookup Cache (DNLC), clients do | ||||
| not need to send LOOKUP calls to the server every time these files | ||||
| are accessed. | ||||
| </t> | ||||
| <t> | ||||
| This caching approach works reasonably well at reducing network | ||||
| traffic in many environments. However, it does not address | ||||
| environments where there are numerous queries for files that do not | ||||
| exist. In these cases of "misses", the client sends requests to | ||||
| the server in order to provide reasonable application semantics and | ||||
| promptly detect the creation of new directory entries. Examples of | ||||
| high miss activity are compilation in software development | ||||
| environments. The current behavior of NFS limits its potential | ||||
| scalability and wide-area sharing effectiveness in these types of | ||||
| environments. Other distributed stateful file system architectures | ||||
| such as AFS and DFS have proven that adding state around directory | ||||
| contents can greatly reduce network traffic in high-miss | ||||
| environments. | ||||
| </t> | ||||
| <t> | ||||
| Delegation of directory contents is an <bcp14>OPTIONAL</bcp14> feature of NFSv4.1. | ||||
| Directory delegations provide similar traffic reduction | ||||
| benefits as with file delegations. By allowing clients to cache | ||||
| directory contents (in a read-only fashion) while being notified of | ||||
| changes, the client can avoid making frequent requests to interrogate | ||||
| the contents of slowly-changing directories, reducing network traffic | ||||
| and improving client performance. It can also simplify the task of | ||||
| determining whether other clients are making changes to the directory | ||||
| when the client itself is making many changes to the directory and | ||||
| changes are not serialized. | ||||
| </t> | ||||
| <t> | ||||
| Directory delegations allow improved namespace cache consistency to be | ||||
| achieved through delegations and synchronous recalls, in the absence | ||||
| of notifications. In addition, if time-based consistency is | ||||
| sufficient, asynchronous notifications can provide performance | ||||
| benefits for the client, and possibly the server, under some common | ||||
| operating conditions such as slowly-changing and/or very large | ||||
| directories. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Directory Delegation Design</name> | ||||
| <t> | ||||
| NFSv4.1 introduces the GET_DIR_DELEGATION | ||||
| (<xref target="OP_GET_DIR_DELEGATION" format="default"/>) operation to allow the | ||||
| client to ask for a | ||||
| directory delegation. The delegation covers directory attributes and | ||||
| all entries in the directory. If either of these change, the | ||||
| delegation will be recalled synchronously. The operation causing the | ||||
| recall will have to wait before the recall is complete. Any changes | ||||
| to directory entry attributes will not cause the delegation to be | ||||
| recalled. | ||||
| </t> | ||||
| <t> | ||||
| In addition to asking for delegations, a client can also ask for | ||||
| notifications for certain events. These events include changes to | ||||
| the directory's attributes and/or its contents. If a client asks for | ||||
| notification for a certain event, the server will notify the client | ||||
| when that event occurs. This will not result in the delegation being | ||||
| recalled for that client. The notifications are asynchronous and | ||||
| provide a way of avoiding recalls in situations where a directory is | ||||
| changing enough that the pure recall model may not be effective while | ||||
| trying to allow the client to get substantial benefit. In the absence | ||||
| of notifications, once the delegation is recalled the client has to | ||||
| refresh its directory cache; this might not be very efficient for | ||||
| very large directories. | ||||
| </t> | ||||
| <t> | ||||
| The delegation is read-only and the client may not make changes to | ||||
| the directory other than by performing NFSv4.1 operations that modify | ||||
| the directory or the associated file attributes so that the server | ||||
| has knowledge of these changes. In order to keep the client's | ||||
| namespace synchronized with that of the server, the server will notify | ||||
| the delegation-holding client (assuming it has requested | ||||
| notifications) of the changes made as a result of that client's | ||||
| directory-modifying operations. This is to avoid any need for | ||||
| that client to send subsequent GETATTR or READDIR operations | ||||
| to the server. If a single client is holding the delegation | ||||
| and that client makes any changes to the directory (i.e., the | ||||
| changes are made via operations sent on a session | ||||
| associated with the client ID holding the delegation), the | ||||
| delegation will not be recalled. Multiple clients may hold a delegation | ||||
| on the same directory, but if any such client modifies the directory, | ||||
| the server <bcp14>MUST</bcp14> recall the delegation from the other clients, | ||||
| unless those clients have made provisions to be notified of that | ||||
| sort of modification. | ||||
| </t> | ||||
| <t> | ||||
| Delegations can be recalled by the server at any time. Normally, the | ||||
| server will recall the delegation when the directory changes in a way | ||||
| that is not covered by the notification, or when the directory | ||||
| changes and notifications have not been requested. | ||||
| If another client removes the directory for | ||||
| which a delegation has been granted, the server will recall the | ||||
| delegation. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Attributes in Support of Directory Notifications</name> | ||||
| <t> | ||||
| See <xref target="dir_not_attrs" format="default"/> for a description of the attributes | ||||
| associated with directory notifications. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Directory Delegation Recall</name> | ||||
| <t> | ||||
| The server will recall the directory delegation by sending a callback | ||||
| to the client. It will use the same callback procedure as used for | ||||
| recalling file delegations. The server will recall the delegation | ||||
| when the directory changes in a way that is not covered by the | ||||
| notification. However, the server need not recall the delegation if | ||||
| attributes of an entry within the directory change. | ||||
| </t> | ||||
| <t> | ||||
| If the | ||||
| server notices that handing out a delegation for a directory is | ||||
| causing too many notifications to be sent out, it may decide to | ||||
| not hand out delegations for that directory and/or recall those already | ||||
| granted. If a client tries to remove the directory for which | ||||
| a delegation has been granted, the server will recall all associated delegations. | ||||
| </t> | ||||
| <t> | ||||
| The implementation sections for a number | ||||
| of operations describe situations in which notification or | ||||
| delegation recall would be required under some common circumstances. | ||||
| In this regard, a similar set of caveats to those listed | ||||
| in <xref target="deleg_and_cb" format="default"/> apply. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| For CREATE, see <xref target="OP_CREATE_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| For LINK, see <xref target="OP_LINK_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| For OPEN, see <xref target="OP_OPEN_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| For REMOVE, see <xref target="OP_REMOVE_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| For RENAME, see <xref target="OP_RENAME_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| For SETATTR, see <xref target="OP_SETATTR_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Directory Delegation Recovery</name> | ||||
| <t> | ||||
| Recovery from client or server restart for state on regular files | ||||
| has two main goals: avoiding the necessity of | ||||
| breaking application guarantees with respect to locked files and | ||||
| delivery of updates cached at the client. Neither of these | ||||
| goals applies to directories protected by OPEN_DELEGATE_READ delegations and | ||||
| notifications. Thus, no provision is made for reclaiming | ||||
| directory delegations in the event of client or server restart. | ||||
| The client can simply establish a directory delegation in the | ||||
| same fashion as was done initially. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="NEW11" numbered="true" toc="default"> | ||||
| <name>Multi-Server Namespace</name> | ||||
| <t> | ||||
| NFSv4.1 supports attributes that allow a namespace to extend | ||||
| beyond the boundaries of a single server. It is desirable | ||||
| that clients and servers support construction of such | ||||
| multi-server namespaces. Use of such multi-server namespaces | ||||
| is <bcp14>OPTIONAL</bcp14>; however, and for many purposes, | ||||
| single-server namespaces are perfectly acceptable. The use | ||||
| of multi-server namespaces can provide many advantages | ||||
| by separating a file system's logical position in a namespace | ||||
| from the (possibly changing) logistical and administrative | ||||
| considerations that cause a particular file system to be | ||||
| located on a particular server via a single network access | ||||
| path that has to be known in advance or determined using DNS. | ||||
| </t> | ||||
| <section anchor="SEC11-TERM" numbered="true" toc="default"> | ||||
| <name>Terminology</name> | ||||
| <t> | ||||
| In this section as a whole (i.e., within all of <xref target="NEW11" format="default"/>), | ||||
| the phrase "client ID" always refers to the | ||||
| 64-bit shorthand identifier assigned by the server (a clientid4) | ||||
| and never to the structure that the client uses to identify itself | ||||
| to the server (called an nfs_client_id4 or client_owner in NFSv4.0 | ||||
| and NFSv4.1, respectively). The opaque identifier within those | ||||
| structures is referred to as a "client id string". | ||||
| </t> | ||||
| <section anchor="SEC11-TERM-trunking" numbered="true" toc="default"> | ||||
| <name>Terminology Related to Trunking</name> | ||||
| <t> | ||||
| It is particularly important to clarify the distinction | ||||
| between trunking detection and trunking discovery. | ||||
| The definitions we present are applicable to all | ||||
| minor versions of NFSv4, but we will focus on how | ||||
| these terms apply to NFS version 4.1. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| Trunking detection refers to ways of deciding whether two | ||||
| specific network | ||||
| addresses are connected to the same NFSv4 server. The | ||||
| means available to make this determination depends on the protocol | ||||
| version, and, in some cases, on the client implementation. | ||||
| </t> | ||||
| <t> | ||||
| In the case of NFS version 4.1 and later minor versions, the | ||||
| means of | ||||
| trunking detection are as described in this document | ||||
| and are available to every client. Two network addresses | ||||
| connected to the same server can always be used together | ||||
| to access a particular server | ||||
| but cannot necessarily be used together | ||||
| to access a single session. See below for definitions | ||||
| of the terms "server-trunkable" and "session-trunkable". | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Trunking discovery is a process by which a client using one | ||||
| network address can obtain other addresses that are connected | ||||
| to the same server. | ||||
| Typically, it builds on a trunking detection facility by providing | ||||
| one or more methods by which candidate addresses are made | ||||
| available to the client, | ||||
| who can then use trunking detection to appropriately filter them. | ||||
| </t> | ||||
| <t> | ||||
| Despite the support for trunking detection, there was no | ||||
| description of trunking discovery provided in | ||||
| RFC 5661 <xref target="RFC5661" format="default"/>, making it necessary to provide | ||||
| those means in this document. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The combination of a server network address and a particular | ||||
| connection type to be used by a connection | ||||
| is referred to as a "server endpoint". Although using different | ||||
| connection types may result in different ports being used, the | ||||
| use of different ports by multiple connections to the same | ||||
| network address in such cases is not the essence of the distinction | ||||
| between the two endpoints used. This is in contrast to the case | ||||
| of port-specific endpoints, | ||||
| in which the explicit specification of port numbers within network | ||||
| addresses is used to allow a single server node to support multiple | ||||
| NFS servers. | ||||
| </t> | ||||
| <t> | ||||
| Two network addresses connected to the same server are said to | ||||
| be server-trunkable. Two such addresses support the use of | ||||
| client ID trunking, as described in <xref target="Trunking" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| Two network addresses connected to the same server such that | ||||
| those addresses can be used to support a single common session | ||||
| are referred to as session-trunkable. Note that two addresses | ||||
| may be server-trunkable without being session-trunkable, and that, | ||||
| when two connections of different connection types are made | ||||
| to the same network address and are based on a single file | ||||
| system location entry, they are always | ||||
| session-trunkable, independent of the connection type, as | ||||
| specified by <xref target="Trunking" format="default"/>, since their derivation from | ||||
| the same file system location entry, together with the identity of | ||||
| their network addresses, assures that both connections are to the | ||||
| same server and will return server-owner information, allowing | ||||
| session trunking to be used. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-TERM-loc" numbered="true" toc="default"> | ||||
| <name>Terminology Related to File System Location</name> | ||||
| <t> | ||||
| Regarding the terminology that relates to the construction of multi-server | ||||
| namespaces out of a set of local per-server namespaces: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Each server has a set of exported file systems that may be accessed | ||||
| by NFSv4 clients. Typically, this is done by assigning each | ||||
| file system a name within the pseudo-fs associated with the | ||||
| server, although the pseudo-fs may be dispensed with if there | ||||
| is only a single exported file system. Each such file system | ||||
| is part of the server's local namespace, and can be considered | ||||
| as a file system instance within a larger multi-server namespace. | ||||
| </li> | ||||
| <li> | ||||
| The set of all exported file systems for a given server | ||||
| constitutes that server's local namespace. | ||||
| </li> | ||||
| <li> | ||||
| In some cases, a server will have a namespace more extensive | ||||
| than its local namespace by using features associated with | ||||
| attributes that provide file system location information. | ||||
| These features, | ||||
| which allow construction of a multi-server namespace, | ||||
| are all described in individual sections below and include | ||||
| referrals (<xref target="SEC11-USES-ref" format="default"/>), | ||||
| migration (<xref target="SEC11-USES-migr" format="default"/>), and | ||||
| replication (<xref target="SEC11-USES-repl" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| A file system present in a server's pseudo-fs may have multiple | ||||
| file system instances on different servers associated with it. | ||||
| All such instances are considered replicas of one another. | ||||
| Whether such replicas can be used simultaneously is discussed in | ||||
| <xref target="SEC11-EFF-simul" format="default"/>, while the level of | ||||
| coordination between them (important when switching | ||||
| between them) is discussed in Sections | ||||
| <xref target="SEC11-EFF-fh" format="counter"/> | ||||
| through <xref target="SEC11-EFF-data" format="counter"/> below. | ||||
| </li> | ||||
| <li> | ||||
| When a file system is present in a server's pseudo-fs, but | ||||
| there is no corresponding local file system, it is said to | ||||
| be "absent". In such cases, all associated instances will | ||||
| be accessed on other servers. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Regarding the terminology that relates to attributes used in trunking | ||||
| discovery and other multi-server namespace features: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| File system location attributes include the fs_locations and | ||||
| fs_locations_info attributes. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| File system location entries provide the individual file system | ||||
| locations within the file system location attributes. | ||||
| Each such entry specifies a | ||||
| server, in the form of a hostname or an address, and an fs name, | ||||
| which designates the location of the file system within | ||||
| the server's local namespace. A file system location entry designates a set | ||||
| of server endpoints to which the client may establish connections. | ||||
| There may be multiple endpoints because a hostname may map to | ||||
| multiple network addresses and because multiple connection types | ||||
| may be | ||||
| used to communicate with a single network address. However, | ||||
| except where explicit port numbers are used to designate a set | ||||
| of servers within a single server node, all | ||||
| such endpoints <bcp14>MUST</bcp14> designate a way of connecting to a single server. | ||||
| The exact form of the location entry varies with the | ||||
| particular file system location attribute used, as described in | ||||
| <xref target="SEC11-loc-attr" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| The network addresses used in file system location entries | ||||
| typically appear without port number indications and are | ||||
| used to designate a server at one of the standard ports for NFS access, | ||||
| e.g., 2049 for TCP or 20049 for use with RPC-over-RDMA. Port | ||||
| numbers may be used | ||||
| in file system location entries to designate servers (typically | ||||
| user-level ones) accessed using other port numbers. In the case where | ||||
| network addresses indicate trunking relationships, the use of an explicit | ||||
| port number is inappropriate since trunking is a relationship between | ||||
| network addresses. See <xref target="SEC11-USES-trunk" format="default"/> for | ||||
| details. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| File system location elements are derived from | ||||
| location entries, and each | ||||
| describes a particular network access path consisting of a network | ||||
| address and a location within the server's local namespace. | ||||
| Such location elements need not appear | ||||
| within a file system location attribute, but the | ||||
| existence of each location element derives from a corresponding | ||||
| location entry. When a | ||||
| location entry specifies an IP address, there is only a single | ||||
| corresponding location element. File system location entries that | ||||
| contain a hostname are resolved using DNS, and may result | ||||
| in one or more location elements. All location elements | ||||
| consist of a location address that includes the IP address of | ||||
| an interface to a server and an fs name, which is the location | ||||
| of the file system within the server's local namespace. The fs name | ||||
| can be empty if the server has no pseudo-fs and only a single exported | ||||
| file system at the root filehandle. | ||||
| </li> | ||||
| <li> | ||||
| Two file system location elements are said to be | ||||
| server-trunkable if they | ||||
| specify the same fs name and the location addresses are such | ||||
| that the location addresses are server-trunkable. When the | ||||
| corresponding network paths are used, the client will always be | ||||
| able to use client ID trunking, but will only be able to use | ||||
| session trunking if the paths are also session-trunkable. | ||||
| </li> | ||||
| <li> | ||||
| Two file system location elements are said to be session-trunkable | ||||
| if they | ||||
| specify the same fs name and the location addresses are such | ||||
| that the location addresses are session-trunkable. When the | ||||
| corresponding network paths are used, the client will be able to | ||||
| able to use either client ID trunking or session trunking. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Discussion of the term "replica" is complicated by the fact that | ||||
| the term was used in RFC 5661 <xref target="RFC5661" format="default"/> with a meaning | ||||
| different from that used in this document. In short, | ||||
| in <xref target="RFC5661" format="default"/> each replica is identified by a | ||||
| single network access path, while in the current document, a set | ||||
| of network access paths that have server-trunkable network | ||||
| addresses and the same root-relative file system pathname is | ||||
| considered to be a single replica with multiple network access | ||||
| paths. | ||||
| </t> | ||||
| <t> | ||||
| Each set of server-trunkable location elements defines a set of | ||||
| available network access paths to a particular file system. | ||||
| When there | ||||
| are multiple such file systems, each of which containing the | ||||
| same data, these file systems are considered replicas | ||||
| of one another. Logically, such replication | ||||
| is symmetric, since the fs currently in use and an alternate fs | ||||
| are replicas of each other. Often, in other documents, the term | ||||
| "replica" is not applied to the fs currently in use, despite the | ||||
| fact that the replication relation is inherently symmetric. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="SEC11-loc-attr" numbered="true" toc="default"> | ||||
| <name>File System Location Attributes</name> | ||||
| <t> | ||||
| NFSv4.1 contains attributes that provide information | ||||
| about how a given file system may be accessed | ||||
| (i.e., at what network address and namespace position). As a result, file systems | ||||
| in the namespace of one server can be | ||||
| associated with one or more instances of that | ||||
| file system on other servers. These attributes contain file | ||||
| system location | ||||
| entries specifying a server address | ||||
| target (either as a DNS name representing one or more IP | ||||
| addresses or as a specific IP address) together with the pathname | ||||
| of that file system within the associated single-server namespace. | ||||
| </t> | ||||
| <t> | ||||
| The fs_locations_info <bcp14>RECOMMENDED</bcp14> attribute | ||||
| allows specification of one or more file system instance locations | ||||
| where the data corresponding to a given file | ||||
| system may be found. | ||||
| In addition to the specification of file system instance locations, | ||||
| this attribute provides helpful information to do the following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Guide choices among the various file system instances | ||||
| provided (e.g., priority for use, writability, currency, etc.). | ||||
| </li> | ||||
| <li> | ||||
| Help the client efficiently effect as seamless | ||||
| a transition as possible among multiple file system instances, | ||||
| when and if that should be necessary. | ||||
| </li> | ||||
| <li> | ||||
| Guide the selection of the appropriate | ||||
| connection type to be used when establishing a connection. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Within the fs_locations_info attribute, each | ||||
| fs_locations_server4 entry corresponds to a file system | ||||
| location entry: the fls_server field designates the server, | ||||
| and the fl_rootpath field of the encompassing fs_locations_item4 | ||||
| gives the location pathname within the server's pseudo-fs. | ||||
| </t> | ||||
| <t> | ||||
| The fs_locations attribute defined in NFSv4.0 is also a part of | ||||
| NFSv4.1. This attribute only allows specification of the file system | ||||
| locations where the data corresponding to a given file | ||||
| system may be found. Servers <bcp14>SHOULD</bcp14> make this attribute available | ||||
| whenever fs_locations_info is supported, but client use of | ||||
| fs_locations_info is preferable because it provides more information. | ||||
| </t> | ||||
| <t> | ||||
| Within the fs_locations attribute, each fs_location4 contains a | ||||
| file system location entry with the server field designating | ||||
| the server and the rootpath field giving the location pathname | ||||
| within the server's pseudo-fs. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="presence_or_absence" numbered="true" toc="default"> | ||||
| <name>File System Presence or Absence</name> | ||||
| <t> | ||||
| A given location in an NFSv4.1 namespace (typically but not necessarily | ||||
| a multi-server namespace) can have a number of file system instance | ||||
| locations | ||||
| associated with it (via the fs_locations or fs_locations_info | ||||
| attribute). There may also be an actual current file system at | ||||
| that location, accessible via normal namespace operations (e.g., | ||||
| LOOKUP). In this case, the file system is said to be | ||||
| "present" at that position in the namespace, and clients will | ||||
| typically use it, reserving use of additional locations | ||||
| specified via the location-related attributes to situations in | ||||
| which the principal location is no longer available. | ||||
| </t> | ||||
| <t> | ||||
| When there is no actual file system at the namespace location | ||||
| in question, the file system is said to be "absent". An absent | ||||
| file system contains no files or directories other than the | ||||
| root. Any reference to it, except | ||||
| to access a small set of attributes useful in determining | ||||
| alternate locations, will result in an error, NFS4ERR_MOVED. | ||||
| Note that if the server ever returns the error NFS4ERR_MOVED, | ||||
| it <bcp14>MUST</bcp14> support the fs_locations | ||||
| attribute and <bcp14>SHOULD</bcp14> support the fs_locations_info and fs_status | ||||
| attributes. | ||||
| </t> | ||||
| <t> | ||||
| While the error name suggests that we have a case of a file system | ||||
| that once was present, and has only become absent later, this is | ||||
| only one possibility. A position in the namespace may be permanently | ||||
| absent with the set of file system(s) designated by the location | ||||
| attributes being the only realization. | ||||
| The name NFS4ERR_MOVED reflects an earlier, | ||||
| more limited conception of its function, but this error will be | ||||
| returned whenever the referenced file system is absent, whether it | ||||
| has moved or not. | ||||
| </t> | ||||
| <t> | ||||
| Except in the case of GETATTR-type operations (to be discussed | ||||
| later), when the | ||||
| current filehandle at the start of an operation is within an | ||||
| absent file system, that operation is not performed and the error | ||||
| NFS4ERR_MOVED is returned, to indicate that the file system is | ||||
| absent on the current server. | ||||
| </t> | ||||
| <t> | ||||
| Because a GETFH cannot succeed if the current filehandle is | ||||
| within an absent file system, filehandles within an absent | ||||
| file system cannot be transferred to the client. When a | ||||
| client does have filehandles within an absent file system, it | ||||
| is the result of obtaining them when the file system was | ||||
| present, and having the file system become | ||||
| absent subsequently. | ||||
| </t> | ||||
| <t> | ||||
| It should be noted that because the check for the current | ||||
| filehandle being within an absent file system happens at the | ||||
| start of every operation, operations that change the current | ||||
| filehandle so that it is within an absent file system will not | ||||
| result in an error. This allows such combinations as | ||||
| PUTFH-GETATTR and LOOKUP-GETATTR to be used to get attribute | ||||
| information, particularly location attribute information, | ||||
| as discussed below. | ||||
| </t> | ||||
| <t> | ||||
| The <bcp14>RECOMMENDED</bcp14> file system attribute fs_status | ||||
| can be used to interrogate the present/absent status of a | ||||
| given file system. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="absent_fs_attributes" numbered="true" toc="default"> | ||||
| <name>Getting Attributes for an Absent File System</name> | ||||
| <t> | ||||
| When a file system is absent, most attributes are not available, | ||||
| but it is necessary to allow the client access to the small | ||||
| set of attributes that are available, and most particularly | ||||
| those that give information about the correct current locations | ||||
| for this file system: fs_locations and fs_locations_info. | ||||
| </t> | ||||
| <section anchor="absent_getattr" numbered="true" toc="default"> | ||||
| <name>GETATTR within an Absent File System</name> | ||||
| <t> | ||||
| As mentioned above, an exception is made for GETATTR in that | ||||
| attributes may be obtained for a filehandle within an absent | ||||
| file system. This exception only applies if the attribute | ||||
| mask contains at least one attribute bit that indicates the | ||||
| client is interested in a result regarding an absent file | ||||
| system: fs_locations, fs_locations_info, or fs_status. | ||||
| If none of these attributes | ||||
| is requested, GETATTR will result in an NFS4ERR_MOVED error. | ||||
| </t> | ||||
| <t> | ||||
| When a GETATTR is done on an absent file system, the set of | ||||
| supported attributes is very limited. Many attributes, including | ||||
| those that are normally <bcp14>REQUIRED</bcp14>, will not be available on an | ||||
| absent file system. In addition to the attributes mentioned | ||||
| above (fs_locations, fs_locations_info, fs_status), the following | ||||
| attributes <bcp14>SHOULD</bcp14> be available on absent file systems. In the | ||||
| case of <bcp14>RECOMMENDED</bcp14> attributes, they should be available at | ||||
| least to the same degree that they are available on present file systems. | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>change_policy:</dt> | ||||
| <dd> | ||||
| This attribute is useful for absent file systems | ||||
| and can be helpful in summarizing to the client when any | ||||
| of the location-related attributes change. | ||||
| </dd> | ||||
| <dt>fsid:</dt> | ||||
| <dd> | ||||
| This attribute should be provided so that the client | ||||
| can determine file system boundaries, including, in | ||||
| particular, the boundary between present and absent file | ||||
| systems. This value must be different from any other fsid | ||||
| on the current server and need have no particular relationship | ||||
| to fsids on any particular destination to which the client | ||||
| might be directed. | ||||
| </dd> | ||||
| <dt>mounted_on_fileid:</dt> | ||||
| <dd> | ||||
| For objects at the top of an absent | ||||
| file system, this attribute needs to be available. Since | ||||
| the fileid is within the present parent file | ||||
| system, there should be no need to reference the absent file | ||||
| system to provide this information. | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| Other attributes <bcp14>SHOULD NOT</bcp14> be made available for absent file | ||||
| systems, even when it is possible to provide them. The server | ||||
| should not assume that more information is always better and | ||||
| should avoid gratuitously providing additional information. | ||||
| </t> | ||||
| <t> | ||||
| When a GETATTR operation includes a bit mask for one of the | ||||
| attributes fs_locations, fs_locations_info, or fs_status, but | ||||
| where the bit mask includes attributes that are not supported, | ||||
| GETATTR will not return an error, but will return the mask | ||||
| of the actual attributes supported with the results. | ||||
| </t> | ||||
| <t> | ||||
| Handling of VERIFY/NVERIFY is similar to GETATTR in that if | ||||
| the attribute mask does not include fs_locations, fs_locations_info, | ||||
| or fs_status, the error NFS4ERR_MOVED will result. It differs in | ||||
| that any appearance in the attribute mask of an attribute not | ||||
| supported for an absent file system (and note that this will | ||||
| include some normally <bcp14>REQUIRED</bcp14> attributes) will also cause | ||||
| an NFS4ERR_MOVED result. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="absent_readdir" numbered="true" toc="default"> | ||||
| <name>READDIR and Absent File Systems</name> | ||||
| <t> | ||||
| A READDIR performed when the current filehandle is within an | ||||
| absent file system will result in an NFS4ERR_MOVED error, | ||||
| since, unlike the case of GETATTR, no such exception is | ||||
| made for READDIR. | ||||
| </t> | ||||
| <t> | ||||
| Attributes for an absent file system may be fetched via a | ||||
| READDIR for a directory in a present file system, when that | ||||
| directory contains the root directories of one or more absent | ||||
| file systems. In this case, the handling is as follows: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the attribute set requested includes one of the attributes | ||||
| fs_locations, fs_locations_info, or fs_status, then fetching of | ||||
| attributes proceeds normally and no NFS4ERR_MOVED indication | ||||
| is returned, even when the rdattr_error attribute is | ||||
| requested. | ||||
| </li> | ||||
| <li> | ||||
| If the attribute set requested does not include one of the | ||||
| attributes | ||||
| fs_locations, fs_locations_info, or fs_status, then if the | ||||
| rdattr_error attribute is requested, each directory entry for | ||||
| the root of an absent file system will report | ||||
| NFS4ERR_MOVED as the value of the rdattr_error attribute. | ||||
| </li> | ||||
| <li> | ||||
| If the attribute set requested does not include any of the | ||||
| attributes fs_locations, fs_locations_info, fs_status, or | ||||
| rdattr_error, then the occurrence of the root of an absent | ||||
| file system within the directory will result in the | ||||
| READDIR failing with an NFS4ERR_MOVED error. | ||||
| </li> | ||||
| <li> | ||||
| The unavailability of an attribute because of a file system's | ||||
| absence, even one that is ordinarily <bcp14>REQUIRED</bcp14>, does not result | ||||
| in any error indication. The set of attributes returned for | ||||
| the root directory of the absent file system in that case is | ||||
| simply restricted to those actually available. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="SEC11-USES" numbered="true" toc="default"> | ||||
| <name>Uses of File System Location Information</name> | ||||
| <t> | ||||
| The file system location attributes | ||||
| (i.e., fs_locations and fs_locations_info), | ||||
| together with the possibility of absent file systems, provide | ||||
| a number of important facilities for reliable, manageable, | ||||
| and scalable data access. | ||||
| </t> | ||||
| <t> | ||||
| When a file system is present, these attributes can provide | ||||
| the following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The locations of alternative replicas to be used to access the | ||||
| same data in the event of server failures, communications problems, | ||||
| or other difficulties that make continued access to the current | ||||
| replica impossible or otherwise impractical. Provisioning and | ||||
| use of such alternate replicas is referred to as "replication" | ||||
| and is discussed in | ||||
| <xref target="SEC11-USES-repl" format="default"/> below. | ||||
| </li> | ||||
| <li> | ||||
| The network address(es) to be used to access the current file | ||||
| system instance or replicas of it. Client use of this information is | ||||
| discussed in <xref target="SEC11-USES-trunk" format="default"/> below. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Under some circumstances, multiple replicas | ||||
| may be used simultaneously to provide higher-performance | ||||
| access to the file system in question, although the lack of state | ||||
| sharing between servers may be an impediment to such use. | ||||
| </t> | ||||
| <t> | ||||
| When a file system is present but becomes absent, clients can be | ||||
| given the opportunity to have continued access to their data | ||||
| using a different replica. In this case, a continued attempt | ||||
| to use the data in the now-absent file system will result | ||||
| in an NFS4ERR_MOVED error, and then the successor | ||||
| replica or set of possible replica choices | ||||
| can be fetched and used to continue access. Transfer of access | ||||
| to the new replica location is referred to as | ||||
| "migration" and is discussed in | ||||
| <xref target="SEC11-USES-repl" format="default"/> below. | ||||
| </t> | ||||
| <t> | ||||
| When a file system is currently absent, specification | ||||
| of file system location provides a means by which file systems | ||||
| located on one server can be associated with a namespace | ||||
| defined by another server, thus allowing a general multi-server | ||||
| namespace facility. A designation of such a remote instance, in | ||||
| place of a file system not previously present, is called | ||||
| a "pure referral" and is discussed in | ||||
| <xref target="SEC11-USES-ref" format="default"/> below. | ||||
| </t> | ||||
| <t> | ||||
| Because client support for attributes related to file | ||||
| system location is | ||||
| <bcp14>OPTIONAL</bcp14>, a server may choose to take action | ||||
| to hide migration and referral events from such clients, by | ||||
| acting as a proxy, for example. The server can determine | ||||
| the presence of client support from the arguments of the | ||||
| EXCHANGE_ID operation (see | ||||
| <xref target="OP_EXCHANGE_ID_DESCRIPTION" format="default"/>). | ||||
| </t> | ||||
| <section anchor="SEC11-USES-mult" numbered="true" toc="default"> | ||||
| <name>Combining Multiple Uses in a Single Attribute</name> | ||||
| <t> | ||||
| A file system location attribute will sometimes contain information | ||||
| relating to the location of multiple replicas, which may | ||||
| be used in different ways: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| File system location entries that relate to the file system instance | ||||
| currently in | ||||
| use provide trunking information, allowing the client to | ||||
| find additional network addresses by which the instance may be | ||||
| accessed. | ||||
| </li> | ||||
| <li> | ||||
| File system location entries that provide information about | ||||
| replicas to which access is to be transferred. | ||||
| </li> | ||||
| <li> | ||||
| Other file system location entries that relate to replicas | ||||
| that are available to | ||||
| use in the event that access to the current replica becomes | ||||
| unsatisfactory. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| In order to simplify client handling and to allow the best choice | ||||
| of replicas to access, the server should adhere to the following | ||||
| guidelines: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| All file system location entries that relate to a | ||||
| single file system instance should be adjacent. | ||||
| </li> | ||||
| <li> | ||||
| File system location entries that relate to the instance | ||||
| currently in use should appear first. | ||||
| </li> | ||||
| <li> | ||||
| File system location entries that relate to replica(s) | ||||
| to which migration | ||||
| is occurring should appear before replicas that are available | ||||
| for later use if the current replica should become inaccessible. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="SEC11-USES-trunk" numbered="true" toc="default"> | ||||
| <name>File System Location Attributes and Trunking</name> | ||||
| <t> | ||||
| Trunking is the use of multiple connections between a client and | ||||
| server in order to increase the speed of data transfer. | ||||
| A client may determine the set of network addresses to use to | ||||
| access a given file system in a number of ways: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| When the name of the server is known to the client, it may use | ||||
| DNS to obtain a set of network addresses to use in | ||||
| accessing the server. | ||||
| </li> | ||||
| <li> | ||||
| The client may fetch the file system location attribute for the | ||||
| file system. This will | ||||
| provide either the name of the server (which can be turned | ||||
| into a set of network addresses using DNS) or | ||||
| a set of server-trunkable location entries. Using the latter | ||||
| alternative, the server can | ||||
| provide addresses it regards as desirable to use | ||||
| to access the file system in question. Although these entries can | ||||
| contain port numbers, these port numbers are not used in determining | ||||
| trunking relationships. Once the candidate addresses have been | ||||
| determined and EXCHANGE_ID done to the proper server, only the value | ||||
| of the so_major_id field returned by the servers in question determines | ||||
| whether a trunking relationship actually exists. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When the client fetches a location attribute | ||||
| for a file system, it should be noted that the client may encounter multiple entries for a number of | ||||
| reasons, such that when it determines trunking information, it may | ||||
| need | ||||
| to bypass addresses not trunkable with one already known. | ||||
| </t> | ||||
| <t> | ||||
| The server can provide location entries that include either | ||||
| names or network addresses. It might use the latter form | ||||
| because of DNS-related security concerns or because the set | ||||
| of addresses | ||||
| to be used might require active management by the server. | ||||
| </t> | ||||
| <t> | ||||
| Location entries used to discover candidate addresses for | ||||
| use in trunking are subject to change, as discussed in | ||||
| <xref target="SEC11-USES-changes" format="default"/> below. | ||||
| The client may respond to | ||||
| such changes by using additional addresses once they are | ||||
| verified or by ceasing to use | ||||
| existing ones. The server can force the client to cease using | ||||
| an address by returning NFS4ERR_MOVED when that address is used to | ||||
| access a file system. This allows a transfer of client access | ||||
| that is similar to migration, although the same file system instance | ||||
| is accessed throughout. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-USES-types" numbered="true" toc="default"> | ||||
| <name>File System Location Attributes and Connection Type Selection</name> | ||||
| <t> | ||||
| Because of the need to support multiple types of connections, | ||||
| clients face | ||||
| the issue of determining the proper connection type to use | ||||
| when establishing | ||||
| a connection to a given server network address. In some cases, | ||||
| this issue can be addressed through the use of the connection | ||||
| "step-up" facility described in | ||||
| <xref target="OP_CREATE_SESSION" format="default"/>. However, | ||||
| because there are cases in which that facility is not available, | ||||
| the client may have to choose a connection type with no | ||||
| possibility of changing it within the scope of a single connection. | ||||
| </t> | ||||
| <t> | ||||
| The two file system location attributes differ as to the | ||||
| information made available in this regard. The fs_locations attribute provides no information | ||||
| to support connection type selection. As a result, clients | ||||
| supporting multiple connection types would need to attempt to | ||||
| establish connections using multiple connection types until | ||||
| the one preferred by the client is successfully established. | ||||
| </t> | ||||
| <t> | ||||
| The fs_locations_info attribute includes the FSLI4TF_RDMA flag, | ||||
| which is convenient for a client wishing to use RDMA. When this | ||||
| flag is set, it indicates that RPC-over-RDMA support is available | ||||
| using the specified location entry. A client can establish a TCP | ||||
| connection and then convert that connection to use RDMA by using | ||||
| the step-up facility. | ||||
| </t> | ||||
| <t> | ||||
| Irrespective of the particular attribute used, when there is | ||||
| no indication that a step-up operation can be performed, | ||||
| a client supporting RDMA operation can establish a new RDMA | ||||
| connection, and it can be bound to | ||||
| the session already established by the | ||||
| TCP connection, allowing the TCP connection to be dropped | ||||
| and the session converted to further use in RDMA mode, if | ||||
| the server supports that. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-USES-repl" numbered="true" toc="default"> | ||||
| <name>File System Replication</name> | ||||
| <t> | ||||
| The fs_locations and fs_locations_info attributes provide | ||||
| alternative file system locations, to be used to access data in place | ||||
| of or in addition to | ||||
| the current file system instance. On first access to a | ||||
| file system, the client should obtain the set | ||||
| of alternate locations by interrogating the fs_locations or | ||||
| fs_locations_info attribute, with the latter being preferred. | ||||
| </t> | ||||
| <t> | ||||
| In the event that the occurrence of server failures, communications | ||||
| problems, | ||||
| or other difficulties make continued access to the current | ||||
| file system impossible or otherwise impractical, the client | ||||
| can use the alternate locations as a way to get continued | ||||
| access to its data. | ||||
| </t> | ||||
| <t> | ||||
| The alternate locations may be physical replicas of the | ||||
| (typically read-only) file system data supplemented by | ||||
| possible asynchronous propagation of updates. Alternatively, | ||||
| they may provide for the use of various forms of server | ||||
| clustering in which multiple servers provide alternate | ||||
| ways of accessing the same physical file system. How the | ||||
| difference between replicas affects file system transitions | ||||
| can be represented within the fs_locations and fs_locations_info | ||||
| attributes, and how the client deals with file system transition | ||||
| issues will be discussed in detail in later sections. | ||||
| </t> | ||||
| <t> | ||||
| Although the location attributes provide some information about | ||||
| the nature of the inter-replica transition, many aspects of the | ||||
| semantics of possible asynchronous updates are not currently described | ||||
| by the protocol, which makes it necessary for clients using replication | ||||
| to switch among replicas undergoing change to familiarize themselves | ||||
| with the semantics of the update approach used. | ||||
| Due to this lack of specificity, many applications may find the | ||||
| use of migration more appropriate because a server can propagate | ||||
| all updates made before an established point in time to the new | ||||
| replica as part of the migration event. | ||||
| </t> | ||||
| <section anchor="SEC11-USES-repl-trunk" numbered="true" toc="default"> | ||||
| <name>File System Trunking Presented as Replication</name> | ||||
| <t> | ||||
| In some situations, a file system location entry may indicate | ||||
| a file system access path to be used as an alternate location, | ||||
| where trunking, rather than replication, is to be used. The | ||||
| situations in which this is appropriate are limited to those | ||||
| in which both of the following are true: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The two file system locations (i.e., the one on which the | ||||
| location attribute is obtained and the one specified in the | ||||
| file system location entry) designate the same locations within | ||||
| their respective single-server namespaces. | ||||
| </li> | ||||
| <li> | ||||
| The two server network addresses (i.e., the one being used to | ||||
| obtain the location attribute and the one specified in the file system | ||||
| location entry) designate the same server (as indicated by the | ||||
| same value of the so_major_id field of the eir_server_owner field | ||||
| returned in response to EXCHANGE_ID). | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When these conditions hold, operations using both access paths are | ||||
| generally trunked, although trunking may be disallowed when the | ||||
| attribute fs_locations_info is used: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| When the fs_locations_info attribute shows the two entries | ||||
| as not having the same simultaneous-use class, trunking is | ||||
| inhibited, and the two access paths cannot be used together. | ||||
| </t> | ||||
| <t> | ||||
| In this case, the two paths can be used serially with no | ||||
| transition activity required on the part of the client, and any | ||||
| transition between access paths is transparent. In transferring | ||||
| access from one to the other, the client acts as if communication | ||||
| were interrupted, establishing a new connection and possibly a | ||||
| new session to continue access to the same file system. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| Note that for two such location entries, any information within | ||||
| the fs_locations_info attribute that indicates the need for special | ||||
| transition activity, i.e., the appearance of the two file system | ||||
| location entries with different handle, fileid, write-verifier, | ||||
| change, and readdir classes, indicates a serious problem. The | ||||
| client, if it allows transition to the file system instance at | ||||
| all, must not treat any transition as a transparent one. | ||||
| The server <bcp14>SHOULD NOT</bcp14> indicate that these two entries (for the | ||||
| same file system on the same server) belong to | ||||
| different handle, fileid, write-verifier, change, and readdir | ||||
| classes, whether or not the two entries are shown belonging to | ||||
| the same simultaneous-use class. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| These situations were recognized by <xref target="RFC5661" format="default"/>, | ||||
| even though that document made no explicit mention of trunking: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| It treated the situation that we describe as trunking as one | ||||
| of simultaneous use of two distinct file system instances, | ||||
| even though, in the explanatory framework now used to | ||||
| describe the situation, the case is one in which a single file | ||||
| system is accessed by two different trunked addresses. | ||||
| </li> | ||||
| <li> | ||||
| It treated the situation in which two paths are to be used | ||||
| serially as a special sort of "transparent transition". However, | ||||
| in the descriptive framework now used to categorize transition | ||||
| situations, this is considered a case of a "network endpoint | ||||
| transition" (see <xref target="SEC11-trans-oview" format="default"/>). | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="SEC11-USES-migr" numbered="true" toc="default"> | ||||
| <name>File System Migration</name> | ||||
| <t> | ||||
| When a file system is present and becomes inaccessible using the | ||||
| current access path, the NFSv4.1 protocol provides a means by | ||||
| which clients can be given the opportunity to have continued access to their data. | ||||
| This may involve using a different access path to the existing replica or | ||||
| providing a path to a different replica. The new access path or | ||||
| the location of the new replica is specified by a file system | ||||
| location attribute. The ensuing migration of access includes | ||||
| the ability to retain locks across the transition. Depending on circumstances, | ||||
| this can involve: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The continued use of the existing clientid when accessing | ||||
| the current replica using a new access path. | ||||
| </li> | ||||
| <li> | ||||
| Use of lock reclaim, taking advantage of a per-fs grace period. | ||||
| </li> | ||||
| <li> | ||||
| Use of Transparent State Migration. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Typically, a client will be | ||||
| accessing the file system in question, get an NFS4ERR_MOVED | ||||
| error, and then use a file system location attribute | ||||
| to determine the new access path for the data. When | ||||
| fs_locations_info is used, additional information will be | ||||
| available that will define the nature of the client's | ||||
| handling of the transition to a new server. | ||||
| </t> | ||||
| <t> | ||||
| In most instances, servers will choose to migrate all clients using | ||||
| a particular file system to a successor replica at the same time | ||||
| to avoid cases in which different clients are updating different | ||||
| replicas. However, migration of an individual client can be helpful | ||||
| in providing load balancing, as long as the replicas in question | ||||
| are such that they represent the same data as described in | ||||
| <xref target="SEC11-EFF-data" format="default"/>. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| In the case in which there is no transition between replicas (i.e., | ||||
| only a change in access path), there are no special | ||||
| difficulties in using of this mechanism to effect load balancing. | ||||
| </li> | ||||
| <li> | ||||
| In the case in which the two replicas are sufficiently coordinated | ||||
| as to allow a single client coherent, simultaneous access to both, | ||||
| there is, in general, no obstacle to the use of migration of particular | ||||
| clients to effect load balancing. Generally, such simultaneous use | ||||
| involves cooperation between servers to ensure that locks granted | ||||
| on two coordinated replicas cannot conflict and can remain effective | ||||
| when transferred to a common replica. | ||||
| </li> | ||||
| <li> | ||||
| In the case in which a large set of clients is accessing a | ||||
| file system in a read-only fashion, it can be helpful to migrate | ||||
| all clients with writable access simultaneously, while using | ||||
| load balancing on the set of read-only copies, as long as the | ||||
| rules in <xref target="SEC11-EFF-data" format="default"/>, | ||||
| which are designed to prevent data reversion, are followed. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| In other cases, the client might not have sufficient guarantees | ||||
| of data similarity or coherence to function properly (e.g., the data | ||||
| in the two replicas is similar but not identical), and the | ||||
| possibility that different clients are updating different replicas | ||||
| can exacerbate the difficulties, making the use of load balancing in | ||||
| such situations a perilous enterprise. | ||||
| </t> | ||||
| <t> | ||||
| The protocol does not specify how the file system will be moved between | ||||
| servers or how updates to multiple replicas will be coordinated. | ||||
| It is anticipated that a number of different | ||||
| server-to-server coordination mechanisms might be used, with the | ||||
| choice left to the server implementer. The NFSv4.1 protocol | ||||
| specifies the method used to communicate the migration | ||||
| event between client and server. | ||||
| </t> | ||||
| <t> | ||||
| In the case of various forms of server clustering, the new location | ||||
| may be another server providing access to the same physical file system. The client's | ||||
| responsibilities in dealing with this transition will depend | ||||
| on whether a switch between replicas has occurred and | ||||
| the means the server has chosen to provide continuity of locking state. | ||||
| These issues will be discussed in detail below. | ||||
| </t> | ||||
| <t> | ||||
| Although a single successor location is typical, multiple | ||||
| locations may be provided. When multiple locations are | ||||
| provided, the client will typically use the first one provided. | ||||
| If that is inaccessible for some reason, later ones can be used. In such | ||||
| cases, the client might consider the transition to the new | ||||
| replica to be a migration event, even though some of the servers | ||||
| involved might not be aware of the use of the server that was | ||||
| inaccessible. In such a case, a client might lose access to | ||||
| locking state as a result of the access transfer. | ||||
| </t> | ||||
| <t> | ||||
| When an alternate location is designated as the target for | ||||
| migration, it must designate the same data | ||||
| (with metadata being the same to the degree indicated by the | ||||
| fs_locations_info attribute). Where file systems are writable, | ||||
| a change made on the original file system must be visible on | ||||
| all migration targets. Where a file system is not writable | ||||
| but represents a read-only copy (possibly periodically | ||||
| updated) of | ||||
| a writable file system, similar requirements apply to the | ||||
| propagation of updates. Any change visible in the original | ||||
| file system must already be effected on all migration targets, | ||||
| to avoid any possibility that a client, in effecting a transition to | ||||
| the migration target, will see any reversion in file system state. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-USES-ref" numbered="true" toc="default"> | ||||
| <name>Referrals</name> | ||||
| <t> | ||||
| Referrals allow the server to associate a file system namespace | ||||
| entry located on one server with a file system located on another server. | ||||
| When this includes | ||||
| the use of pure referrals, servers are provided a way of | ||||
| placing a file system in a location within the namespace | ||||
| essentially without respect to its physical location on a | ||||
| particular server. This allows a single server or a set of servers | ||||
| to present a multi-server namespace that encompasses file systems | ||||
| located on a wider range of servers. Some likely uses of this facility include | ||||
| establishment of site-wide or organization-wide namespaces, | ||||
| with the eventual possibility of combining such | ||||
| together into a truly global namespace, such as the one | ||||
| provided by AFS (the Andrew File System) <xref target="AFS" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| Referrals occur when a client determines, upon first referencing | ||||
| a position in the current namespace, that it is part of a new | ||||
| file system and that the file system is absent. When this | ||||
| occurs, typically upon receiving the error NFS4ERR_MOVED, the | ||||
| actual location or locations of the file system can be | ||||
| determined by fetching a locations attribute. | ||||
| </t> | ||||
| <t> | ||||
| The file system location attribute may designate a single | ||||
| file system location or multiple file system locations, to | ||||
| be selected based on the needs of the client. The server, | ||||
| in the fs_locations_info attribute, may specify priorities to | ||||
| be associated with various file system location choices. | ||||
| The server may assign different priorities to different | ||||
| locations as reported to individual clients, in order to | ||||
| adapt to client physical location or to effect load balancing. | ||||
| When both read-only and read-write file systems are present, | ||||
| some of the read-only locations might not be absolutely up-to-date | ||||
| (as they would have to be in the case of replication and | ||||
| migration). Servers may also specify file system locations | ||||
| that include client-substituted variables so that different | ||||
| clients are referred to different file systems (with different | ||||
| data contents) based on client attributes such as CPU | ||||
| architecture. | ||||
| </t> | ||||
| <t> | ||||
| If the fs_locations_info attribute lists multiple possible targets, | ||||
| the relationships among them may be important to the client in | ||||
| selecting which one to use. | ||||
| The same rules specified in <xref target="SEC11-USES-migr" format="default"/> | ||||
| below regarding multiple migration targets | ||||
| apply to these multiple replicas as well. For example, the | ||||
| client might prefer a writable target on a server that has | ||||
| additional writable | ||||
| replicas to which it subsequently might switch. Note that, | ||||
| as distinguished from the case of replication, there is no | ||||
| need to deal with the case of propagation of updates made by | ||||
| the current client, since the current client has not accessed | ||||
| the file system in question. | ||||
| </t> | ||||
| <t> | ||||
| Use of multi-server namespaces is enabled by NFSv4.1 but is not | ||||
| required. The use of multi-server namespaces and their scope | ||||
| will depend on the applications used and system administration | ||||
| preferences. | ||||
| </t> | ||||
| <t> | ||||
| Multi-server namespaces can be established by a single | ||||
| server providing a large set of pure referrals to all of the | ||||
| included file systems. Alternatively, a single multi-server | ||||
| namespace may be administratively segmented with separate | ||||
| referral file systems (on separate servers) for each | ||||
| separately administered portion of the namespace. The | ||||
| top-level referral file system or any segment may use | ||||
| replicated referral file systems for higher availability. | ||||
| </t> | ||||
| <t> | ||||
| Generally, multi-server namespaces are for the most part | ||||
| uniform, in that the same data made available to one client | ||||
| at a given location in the namespace is made available to | ||||
| all clients at that namespace location. However, | ||||
| there are facilities | ||||
| provided that allow different clients to be directed to | ||||
| different sets of data, for reasons such as enabling | ||||
| adaptation to such client | ||||
| characteristics as CPU architecture. These facilities are | ||||
| described in | ||||
| <xref target="SEC11-fsli-item" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| Note that it is possible, when providing a uniform namespace, | ||||
| to provide different location entries to different clients in | ||||
| order to provide each client with a copy of the data physically | ||||
| closest to it or otherwise optimize access (e.g., provide load | ||||
| balancing). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-USES-changes" numbered="true" toc="default"> | ||||
| <name>Changes in a File System Location Attribute</name> | ||||
| <t> | ||||
| Although clients will typically fetch a file system location attribute | ||||
| when first accessing a file system and when NFS4ERR_MOVED | ||||
| is returned, a client can choose to fetch the attribute | ||||
| periodically, in which case, the value fetched may change over time. | ||||
| </t> | ||||
| <t> | ||||
| For clients not prepared to access multiple replicas simultaneously (see | ||||
| <xref target="SEC11-EFF-simul" format="default"/>), | ||||
| the handling of the various cases of location change are as follows: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Changes in the list of replicas or in the network addresses | ||||
| associated with replicas do not require immediate action. | ||||
| The client will typically update its list of replicas to | ||||
| reflect the new information. | ||||
| </li> | ||||
| <li> | ||||
| Additions to the list of network addresses for the | ||||
| current file system instance need not be acted | ||||
| on promptly. However, to prepare for a subsequent | ||||
| migration event, the client can choose | ||||
| to take note of the new address and then use it | ||||
| whenever it needs to switch access to a new replica. | ||||
| </li> | ||||
| <li> | ||||
| Deletions from the list of network addresses for the | ||||
| current file system instance do not require the client to immediately | ||||
| cease use of existing access paths, although new connections | ||||
| are not to be established on addresses that have been deleted. | ||||
| However, clients can choose to act on such deletions | ||||
| by preparing for an eventual shift in access, which | ||||
| becomes unavoidable as soon as the server returns | ||||
| NFS4ERR_MOVED to indicate that a particular network access path is | ||||
| not usable to access the current file system. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| For clients that are prepared to access several replicas simultaneously, | ||||
| the following additional cases need to be addressed. As in | ||||
| the cases discussed above, changes in the set of replicas | ||||
| need not be acted upon promptly, although the client has | ||||
| the option of adjusting its access even in the absence of | ||||
| difficulties that would lead to the selection of a new replica. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| When a new replica is added, which may be accessed | ||||
| simultaneously with one currently in use, the client is free | ||||
| to use the new replica immediately. | ||||
| </li> | ||||
| <li> | ||||
| When a replica currently in use is deleted from the list, the | ||||
| client need not cease using it immediately. However, since | ||||
| the server may subsequently force such use to cease (by | ||||
| returning NFS4ERR_MOVED), clients might decide to limit the | ||||
| need for later state transfer. For example, new opens might | ||||
| be done on other replicas, rather than on one not present in | ||||
| the list. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="SEC11-TRUNK" numbered="true" toc="default"> | ||||
| <name>Trunking without File System Location Information</name> | ||||
| <t> | ||||
| In situations in which a file system is accessed using two | ||||
| server-trunkable addresses (as indicated by the same value of the | ||||
| so_major_id field of the eir_server_owner field returned in | ||||
| response to EXCHANGE_ID), trunked access is allowed even though | ||||
| there might not be any location entries specifically indicating | ||||
| the use of trunking for that file system. | ||||
| </t> | ||||
| <t> | ||||
| This situation was recognized by <xref target="RFC5661" format="default"/>, although | ||||
| that document made no explicit mention of trunking and treated the | ||||
| situation as one of simultaneous use of two distinct file system | ||||
| instances. In the explanatory framework now used to | ||||
| describe the situation, the case is one in which a single file | ||||
| system is accessed by two different trunked addresses. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-users" numbered="true" toc="default"> | ||||
| <name>Users and Groups in a Multi-Server Namespace</name> | ||||
| <t> | ||||
| As in the case of a single-server environment (see | ||||
| <xref target="owner_owner_group" format="default"/>), | ||||
| when an owner or group name of the form "id@domain" is assigned to | ||||
| a file, there is an implicit promise to return that same string when | ||||
| the corresponding attribute is interrogated subsequently. In the | ||||
| case of a multi-server namespace, that same promise applies even if | ||||
| server boundaries have been crossed. Similarly, when the owner | ||||
| attribute of a file is derived from the security principal that created | ||||
| the file, that attribute should have the same value even if the | ||||
| interrogation occurs on a different server from the file creation. | ||||
| </t> | ||||
| <t> | ||||
| Similarly, the set of security principals recognized by all the | ||||
| participating servers needs to be the same, with each such principal | ||||
| having the same credentials, regardless of the particular server | ||||
| being accessed. | ||||
| </t> | ||||
| <t> | ||||
| In order to meet these requirements, those setting up multi-server | ||||
| namespaces will need to limit the servers included so that: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| In all cases in which more than a single domain is supported, | ||||
| the requirements stated in RFC 8000 <xref target="RFC8000" format="default"/> | ||||
| are to be respected. | ||||
| </li> | ||||
| <li> | ||||
| All servers support a common set of domains that includes all of | ||||
| the domains clients use and expect to see returned as the domain | ||||
| portion of an owner or group in the form "id@domain". Note that, | ||||
| although this set most often consists of a single domain, it is | ||||
| possible for multiple domains to be supported. | ||||
| </li> | ||||
| <li> | ||||
| All servers, for each domain that they support, accept the same set | ||||
| of user and group ids as valid. | ||||
| </li> | ||||
| <li> | ||||
| All servers recognize the same set of security principals. For each | ||||
| principal, the same credential is required, independent of the | ||||
| server being accessed. In addition, the group membership for each such | ||||
| principal is to be the same, independent of the server accessed. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Note that there is no requirement in general that the users | ||||
| corresponding to particular security principals have the same local | ||||
| representation on each server, even though it is most often the case that this is so. | ||||
| </t> | ||||
| <t> | ||||
| When AUTH_SYS is used, the following additional requirements must be met: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Only a single NFSv4 domain can be supported through the use of AUTH_SYS. | ||||
| </li> | ||||
| <li> | ||||
| The "local" representation of all owners and groups must be the same | ||||
| on all servers. The word "local" is used here since that is the | ||||
| way that numeric user and group ids are described in | ||||
| <xref target="owner_owner_group" format="default"/>. However, | ||||
| when AUTH_SYS or stringified numeric owners or | ||||
| groups are used, these identifiers are not truly local, since they | ||||
| are known to the clients as well as to the server. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Similarly, when stringified numeric user and group ids are used, the | ||||
| "local" representation of all owners and groups must be the same on | ||||
| all servers, even when AUTH_SYS is not used. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-csr" numbered="true" toc="default"> | ||||
| <name>Additional Client-Side Considerations</name> | ||||
| <t> | ||||
| When clients make use of servers that implement referrals, | ||||
| replication, and | ||||
| migration, care should be taken that a user who mounts a given | ||||
| file system that includes a referral or a relocated file system | ||||
| continues to see a coherent picture of that user-side file system | ||||
| despite the fact that it contains a number of server-side | ||||
| file systems that may be on different servers. | ||||
| </t> | ||||
| <t> | ||||
| One important issue is upward navigation from the root of a | ||||
| server-side file system to its parent (specified as ".." in UNIX), | ||||
| in the case in which it transitions to that file system as a | ||||
| result of referral, migration, or a transition as a result of | ||||
| replication. When the client is at such a point, and it needs to ascend to | ||||
| the parent, it must go back to the parent as seen within the | ||||
| multi-server namespace rather than sending a LOOKUPP operation to the | ||||
| server, which would result in the parent within that server's | ||||
| single-server namespace. In order to do this, the client | ||||
| needs to remember the filehandles that represent such | ||||
| file system roots and use these instead of sending a | ||||
| LOOKUPP operation to the current server. This will allow the client | ||||
| to present to applications a consistent namespace, where | ||||
| upward navigation and downward navigation are consistent. | ||||
| </t> | ||||
| <t> | ||||
| Another issue concerns refresh of referral locations. When | ||||
| referrals are used extensively, they may change as server | ||||
| configurations change. It is expected that clients will cache | ||||
| information related to traversing referrals so that future | ||||
| client-side requests are resolved locally without server | ||||
| communication. | ||||
| This is usually rooted in client-side name look up caching. Clients | ||||
| should periodically purge this data for referral points in order to | ||||
| detect changes in location information. When the change_policy | ||||
| attribute changes for directories that hold referral entries | ||||
| or for the referral entries themselves, clients should consider | ||||
| any associated | ||||
| cached referral information to be out of date. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-trans-oview" numbered="true" toc="default"> | ||||
| <name>Overview of File Access Transitions</name> | ||||
| <t> | ||||
| File access transitions are of two types: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Those that involve a transition from accessing the current | ||||
| replica to another one in connection with either replication or migration. | ||||
| How these are dealt with is discussed in | ||||
| <xref target="SEC11-EFF" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| Those in which access to the current file system instance is retained, while | ||||
| the network path used to access that instance is changed. This case is | ||||
| discussed in <xref target="SEC11-nwa" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="SEC11-nwa" numbered="true" toc="default"> | ||||
| <name>Effecting Network Endpoint Transitions</name> | ||||
| <t> | ||||
| The endpoints used to access a particular file system instance | ||||
| may change in a number of ways, as listed below. In each of these | ||||
| cases, the same fsid, client IDs, filehandles, and stateids are | ||||
| used to continue access, with a continuity of lock state. In | ||||
| many cases, the same sessions can also be used. | ||||
| </t> | ||||
| <t> | ||||
| The appropriate action depends on the set of replacement addresses | ||||
| that are available for use | ||||
| (i.e., server endpoints that are server-trunkable with one previously | ||||
| being used). | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| When use of a particular address is to cease, and there is | ||||
| also another address | ||||
| currently in use that is server-trunkable with it, requests | ||||
| that would have been issued on the address whose use is to be | ||||
| discontinued can be issued on the remaining address(es). When an | ||||
| address is server-trunkable but not session-trunkable with the | ||||
| address whose use is to be discontinued, the request might need | ||||
| to be modified to reflect the fact that a different session will | ||||
| be used. | ||||
| </li> | ||||
| <li> | ||||
| When use of a particular connection is to cease, as indicated | ||||
| by receiving NFS4ERR_MOVED when using that connection, but | ||||
| that address is | ||||
| still indicated as accessible according to the appropriate | ||||
| file system location | ||||
| entries, it is likely that requests can be issued on a new | ||||
| connection of a different connection type once that connection | ||||
| is established. | ||||
| Since any two non-port-specific server endpoints that share a | ||||
| network address are inherently session-trunkable, the client | ||||
| can use BIND_CONN_TO_SESSION to access the existing session | ||||
| with the new connection. | ||||
| </li> | ||||
| <li> | ||||
| When there are no potential replacement addresses in use, but there | ||||
| are valid addresses session-trunkable with the one whose use is | ||||
| to be discontinued, the client can use BIND_CONN_TO_SESSION | ||||
| to access the existing session using the new address. Although | ||||
| the target session will generally be accessible, there may be | ||||
| rare situations in which that session is no longer accessible | ||||
| when an attempt is made to bind the new connection to it. In this | ||||
| case, the client can create a new session to enable continued | ||||
| access to the existing instance using the new connection, | ||||
| providing for the use of existing filehandles, stateids, and | ||||
| client ids while supplying continuity of locking state. | ||||
| </li> | ||||
| <li> | ||||
| When there is no potential replacement address in use, and there | ||||
| are no valid addresses session-trunkable with the one whose use is | ||||
| to be discontinued, other server-trunkable addresses may be | ||||
| used to provide continued access. Although the use of CREATE_SESSION | ||||
| is available to provide continued access to the existing instance, | ||||
| servers have the option of providing continued access to the | ||||
| existing session through the new network access path in a fashion | ||||
| similar to that provided by session migration (see | ||||
| <xref target="SEC11-trans-locking" format="default"/>). | ||||
| To take advantage of this | ||||
| possibility, clients can perform an initial BIND_CONN_TO_SESSION, | ||||
| as in the previous case, and use CREATE_SESSION only if that fails. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="SEC11-EFF" numbered="true" toc="default"> | ||||
| <name>Effecting File System Transitions</name> | ||||
| <t> | ||||
| There are a range of situations in which there is a change to be | ||||
| effected in the set of replicas used to access a particular | ||||
| file system. Some of these may involve an expansion or | ||||
| contraction of the set of replicas used as discussed in | ||||
| <xref target="SEC11-EFF-simul" format="default"/> below. | ||||
| </t> | ||||
| <t> | ||||
| For reasons explained in that section, most transitions will involve | ||||
| a transition from a single replica to a corresponding replacement | ||||
| replica. When effecting replica transition, some types of | ||||
| sharing between the replicas may affect handling of the | ||||
| transition as described in | ||||
| Sections <xref target="SEC11-EFF-fh" format="counter"/> | ||||
| through <xref target="SEC11-EFF-data" format="counter"/> below. | ||||
| The attribute fs_locations_info provides helpful information | ||||
| to allow the client to determine the degree of inter-replica | ||||
| sharing. | ||||
| </t> | ||||
| <t> | ||||
| With regard to some types of state, the degree of continuity | ||||
| across the transition depends on the occasion prompting the | ||||
| transition, with transitions initiated by the servers | ||||
| (i.e., migration) offering much more scope for a nondisruptive | ||||
| transition than cases in which the client on its own | ||||
| shifts its access to another replica (i.e., replication). | ||||
| This issue potentially applies to locking state and to session | ||||
| state, which are dealt with below as follows: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| An introduction to the possible means of providing continuity in | ||||
| these areas appears in <xref target="SEC11-EFF-lock" format="default"/> below. | ||||
| </li> | ||||
| <li> | ||||
| Transparent State Migration is introduced in | ||||
| <xref target="SEC11-trans-locking" format="default"/>. | ||||
| The possible transfer of | ||||
| session state is addressed there as well. | ||||
| </li> | ||||
| <li> | ||||
| The client handling of transitions, including determining how to | ||||
| deal with the various means that the server might take to | ||||
| supply effective continuity of locking state, is discussed in | ||||
| <xref target="SEC11-trans-client" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| The source and destination servers' responsibilities | ||||
| in effecting Transparent State Migration | ||||
| of locking and session state are discussed in | ||||
| <xref target="SEC11-trans-server" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| <section anchor="SEC11-EFF-simul" numbered="true" toc="default"> | ||||
| <name>File System Transitions and Simultaneous Access</name> | ||||
| <t> | ||||
| The fs_locations_info attribute (described in | ||||
| <xref target="SEC11-li-new" format="default"/>) | ||||
| may indicate that two replicas | ||||
| may be used simultaneously, although some situations in which such | ||||
| simultaneous access is permitted are more appropriately described | ||||
| as instances of trunking (see <xref target="SEC11-USES-repl-trunk" format="default"/>). | ||||
| Although situations | ||||
| in which multiple replicas may be accessed simultaneously are | ||||
| somewhat similar to those in which a single replica is | ||||
| accessed by multiple network addresses, there are important | ||||
| differences since locking state is not shared among multiple | ||||
| replicas. | ||||
| </t> | ||||
| <t> | ||||
| Because of this difference in state handling, many clients will | ||||
| not have the ability to take advantage of the fact that such | ||||
| replicas represent the same data. Such clients will not be | ||||
| prepared to use multiple replicas simultaneously but will access | ||||
| each file system using only a single replica, although the | ||||
| replica selected might make multiple server-trunkable addresses | ||||
| available. | ||||
| </t> | ||||
| <t> | ||||
| Clients who are prepared to use multiple replicas simultaneously | ||||
| can divide opens among replicas however they choose. Once that | ||||
| choice is made, any subsequent transitions will treat the set of locking | ||||
| state associated with each replica as a single entity. | ||||
| </t> | ||||
| <t> | ||||
| For example, if one of the replicas become unavailable, access will be | ||||
| transferred to a different replica, which is also capable of | ||||
| simultaneous access with the one still in use. | ||||
| </t> | ||||
| <t> | ||||
| When there is no such replica, the transition may be to the | ||||
| replica already in use. At this point, the client has a | ||||
| choice between merging the locking state for the two replicas | ||||
| under the aegis of the sole replica in use or treating these | ||||
| separately until another replica capable of simultaneous | ||||
| access presents itself. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-EFF-fh" numbered="true" toc="default"> | ||||
| <name>Filehandles and File System Transitions</name> | ||||
| <t> | ||||
| There are a number of ways in which filehandles can be handled | ||||
| across a file system transition. These can be divided into | ||||
| two broad classes depending upon whether the two file systems | ||||
| across which the transition happens share sufficient state to | ||||
| effect some sort of continuity of file system handling. | ||||
| </t> | ||||
| <t> | ||||
| When there is no such cooperation in filehandle assignment, | ||||
| the two file systems are reported as being in different | ||||
| handle classes. In this case, | ||||
| all filehandles are assumed to expire as part of the | ||||
| file system transition. Note that this behavior does not | ||||
| depend on the fh_expire_type attribute and supersedes | ||||
| the specification | ||||
| of the FH4_VOL_MIGRATION bit, which only affects behavior when | ||||
| fs_locations_info is not available. | ||||
| </t> | ||||
| <t> | ||||
| When there is cooperation in filehandle assignment, | ||||
| the two file systems are reported as being in the same | ||||
| handle classes. In this case, | ||||
| persistent filehandles remain valid after the file system | ||||
| transition, while volatile filehandles (excluding those | ||||
| that are only volatile due to the FH4_VOL_MIGRATION bit) are | ||||
| subject to expiration on the target server. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-EFF-fileid" numbered="true" toc="default"> | ||||
| <name>Fileids and File System Transitions</name> | ||||
| <t> | ||||
| In NFSv4.0, the issue of continuity of fileids in the event | ||||
| of a file system transition was not addressed. The general | ||||
| expectation had been that in situations in | ||||
| which the two file system instances are created by a single vendor | ||||
| using some sort of file system image copy, fileids would be | ||||
| consistent across the transition, while in the analogous | ||||
| multi-vendor transitions they would not. This poses difficulties, | ||||
| especially for the client without special knowledge | ||||
| of the transition mechanisms adopted by the server. Note | ||||
| that although fileid is not a <bcp14>REQUIRED</bcp14> attribute, many servers | ||||
| support fileids and many clients provide APIs that depend on fileids. | ||||
| </t> | ||||
| <t> | ||||
| It is important to note that while clients themselves may have no | ||||
| trouble with a fileid changing as a result of a file system | ||||
| transition event, applications do typically have access to the | ||||
| fileid (e.g., via stat). The result is that an | ||||
| application may work perfectly well if there is no file system | ||||
| instance transition or if any such transition is among instances | ||||
| created by a single vendor, yet be unable to deal with the | ||||
| situation in which a multi-vendor transition occurs at the wrong | ||||
| time. | ||||
| </t> | ||||
| <t> | ||||
| Providing the same fileids in a multi-vendor (multiple server | ||||
| vendors) environment has generally been held to be quite difficult. | ||||
| While there is work to be done, it needs to be pointed out that | ||||
| this difficulty is partly self-imposed. Servers have typically | ||||
| identified fileid with inode number, i.e. with a quantity used to | ||||
| find the file in question. This identification poses special | ||||
| difficulties for migration of a file system between vendors | ||||
| where assigning | ||||
| the same index to a given file may not be possible. Note here that | ||||
| a fileid is not required to be useful to find the file in | ||||
| question, only that it is unique within the given file system. Servers | ||||
| prepared to accept a fileid as a single piece of metadata and store | ||||
| it apart from the value used to index the file information can | ||||
| relatively easily maintain a fileid value across a migration event, | ||||
| allowing a truly transparent migration event. | ||||
| </t> | ||||
| <t> | ||||
| In any case, where servers can provide continuity of fileids, they | ||||
| should, and the client should be able to find out that such | ||||
| continuity is available and take appropriate action. Information | ||||
| about the continuity (or lack thereof) of fileids across a file | ||||
| system transition is represented by specifying whether the file systems | ||||
| in question are of the same fileid class. | ||||
| </t> | ||||
| <t> | ||||
| Note that when consistent fileids do not exist across a | ||||
| transition (either because there is no continuity of fileids | ||||
| or because fileid is not a supported attribute on one of | ||||
| instances involved), and there are | ||||
| no reliable filehandles across a transition event (either because | ||||
| there is no filehandle continuity or because the filehandles are | ||||
| volatile), the client is in a position where it cannot verify | ||||
| that files it was accessing before the transition are the | ||||
| same objects. It is forced to assume that no object has been | ||||
| renamed, and, unless there are guarantees that provide this | ||||
| (e.g., the file system is read-only), problems for applications | ||||
| may occur. Therefore, use of such configurations should be | ||||
| limited to situations where the problems that this may cause | ||||
| can be tolerated. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-EFF-fsid" numbered="true" toc="default"> | ||||
| <name>Fsids and File System Transitions</name> | ||||
| <t> | ||||
| Since fsids are generally only unique on a per-server basis, | ||||
| it is likely that they will change during a file system | ||||
| transition. | ||||
| Clients should not make the fsids received | ||||
| from the server visible to applications since they may not be | ||||
| globally unique, and because they may change during a file | ||||
| system transition event. Applications are best served if they | ||||
| are isolated from such transitions to the extent possible. | ||||
| </t> | ||||
| <t> | ||||
| Although normally a single source file system will transition | ||||
| to a single target file system, there is a provision for splitting | ||||
| a single source file system into multiple target file systems, by | ||||
| specifying the FSLI4F_MULTI_FS flag. | ||||
| </t> | ||||
| <section anchor="SEC11-EFF-fsid-split" numbered="true" toc="default"> | ||||
| <name>File System Splitting</name> | ||||
| <t> | ||||
| When a file system transition is made and the fs_locations_info | ||||
| indicates that the file system in question might be split into | ||||
| multiple file systems (via the FSLI4F_MULTI_FS flag), the client | ||||
| <bcp14>SHOULD</bcp14> do GETATTRs to determine the fsid attribute on all known | ||||
| objects within the file system undergoing transition to determine | ||||
| the new file system boundaries. | ||||
| </t> | ||||
| <t> | ||||
| Clients might choose to | ||||
| maintain the fsids passed to existing applications | ||||
| by mapping all of the fsids for the descendant file systems to | ||||
| the common fsid used for the original file system. | ||||
| </t> | ||||
| <t> | ||||
| Splitting a file system can be done on a transition between | ||||
| file systems of the same fileid | ||||
| class, since the fact that fileids are unique within the | ||||
| source file system ensure they will be unique in each of the | ||||
| target file systems. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="SEC11-EFF-change" numbered="true" toc="default"> | ||||
| <name>The Change Attribute and File System Transitions</name> | ||||
| <t> | ||||
| Since the change attribute is defined as a server-specific one, | ||||
| change attributes fetched from one server are normally presumed to | ||||
| be invalid on another server. Such a presumption is troublesome | ||||
| since it would invalidate all cached change attributes, requiring | ||||
| refetching. Even more disruptive, the absence of any assured | ||||
| continuity for the change attribute means that even if the same | ||||
| value is retrieved on refetch, no conclusions can be drawn as to whether | ||||
| the object in question has changed. The identical change | ||||
| attribute could be merely an artifact of a modified file with | ||||
| a different change attribute construction algorithm, with that | ||||
| new algorithm just happening to result in an identical change | ||||
| value. | ||||
| </t> | ||||
| <t> | ||||
| When the two file systems have consistent change attribute formats, | ||||
| and this fact is communicated to the client by reporting | ||||
| in the same change class, the | ||||
| client may assume a continuity of change attribute construction | ||||
| and handle this situation just as it would be handled without | ||||
| any file system transition. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-EFF-wv" numbered="true" toc="default"> | ||||
| <name>Write Verifiers and File System Transitions</name> | ||||
| <t> | ||||
| In a file system transition, the two file systems might be | ||||
| cooperating in the handling of unstably written data. | ||||
| Clients can determine if this is the | ||||
| case by seeing if the two file systems belong to the same | ||||
| write-verifier class. When this is the case, write | ||||
| verifiers returned | ||||
| from one system may be compared to those returned by the | ||||
| other and superfluous writes can be avoided. | ||||
| </t> | ||||
| <t> | ||||
| When two file systems belong to different | ||||
| write-verifier classes, any verifier | ||||
| generated by one must not be compared to one provided by the | ||||
| other. Instead, the two verifiers should be treated as not | ||||
| equal even when the values are identical. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-EFF-rdc" numbered="true" toc="default"> | ||||
| <name>READDIR Cookies and Verifiers and File System Transitions</name> | ||||
| <t> | ||||
| In a file system transition, the two file systems might be | ||||
| consistent in their handling of READDIR cookies and verifiers. | ||||
| Clients can determine if this is the | ||||
| case by seeing if the two file systems belong to the same | ||||
| readdir class. When this is the case, readdir class, READDIR | ||||
| cookies, and verifiers | ||||
| from one system will be recognized by the other, and | ||||
| READDIR operations started on one server can be validly | ||||
| continued on the other simply by presenting the | ||||
| cookie and verifier returned by a READDIR operation done | ||||
| on the first file system to the second. | ||||
| </t> | ||||
| <t> | ||||
| When two file systems belong to different | ||||
| readdir classes, any READDIR cookie and verifier | ||||
| generated by one is not valid on the second and must not | ||||
| be presented to that server by the client. The client | ||||
| should act as if the verifier were rejected. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-EFF-data" numbered="true" toc="default"> | ||||
| <name>File System Data and File System Transitions</name> | ||||
| <t> | ||||
| When multiple replicas exist and are used simultaneously or in | ||||
| succession by a client, applications using them will normally expect | ||||
| that they contain either the same data or data that is consistent with | ||||
| the normal sorts of changes that are made by other clients | ||||
| updating the data of the file system | ||||
| (with metadata being the same to the degree indicated by the | ||||
| fs_locations_info attribute). However, when multiple file systems are | ||||
| presented as replicas of one another, the precise relationship | ||||
| between the data of one and the data of another is not, as a | ||||
| general matter, specified by the NFSv4.1 protocol. It is quite | ||||
| possible to present as replicas file systems where the data of | ||||
| those file systems is sufficiently different that some applications | ||||
| have problems dealing with the transition between replicas. The | ||||
| namespace will typically be constructed so that applications can | ||||
| choose an appropriate level of support, so that in one position in | ||||
| the namespace, a varied set of replicas might be listed, while in | ||||
| another, only those that are up-to-date would be considered replicas. | ||||
| The protocol does define three special cases of the relationship among | ||||
| replicas to be specified by the server and relied upon by clients: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| When multiple replicas exist and are used simultaneously | ||||
| by a client (see the FSLIB4_CLSIMUL definition within | ||||
| fs_locations_info), they must designate the same | ||||
| data. Where file systems are writable, a change made on | ||||
| one instance must be visible on all instances at the same | ||||
| time, regardless of whether the interrogated instance is the | ||||
| one on which the modification was done. | ||||
| This allows a client to use these replicas | ||||
| simultaneously without any special adaptation to the fact | ||||
| that there are multiple replicas, beyond adapting to the fact | ||||
| that locks obtained on one replica are maintained separately | ||||
| (i.e., under a different client ID). | ||||
| In this case, locks (whether share reservations or | ||||
| byte-range locks) and delegations obtained on one | ||||
| replica are immediately reflected on all replicas, in the | ||||
| sense that access from all other servers is prevented | ||||
| regardless of the replica used. However, because the servers are | ||||
| not required to treat two associated client IDs as | ||||
| representing the same client, it is best to | ||||
| access each file using only a single client ID. | ||||
| </li> | ||||
| <li> | ||||
| When one replica is designated as the successor instance to another | ||||
| existing instance after the return of NFS4ERR_MOVED (i.e., the case of | ||||
| migration), the client may depend on the fact that all changes | ||||
| written to stable storage on the original instance | ||||
| are written to stable storage of the successor (uncommitted | ||||
| writes are dealt with in <xref target="SEC11-EFF-wv" format="default"/> above). | ||||
| </li> | ||||
| <li> | ||||
| Where a file system is not writable but represents a read-only | ||||
| copy (possibly periodically updated) of a writable file system, | ||||
| clients have similar requirements with regard to the propagation | ||||
| of updates. They may need a guarantee that any change visible on | ||||
| the original file system instance must be immediately visible on | ||||
| any replica before the client transitions access to that replica, | ||||
| in order to avoid any possibility that a client, in effecting a transition to a | ||||
| replica, will see any reversion in file system state. | ||||
| The specific means of this guarantee varies based on the value of | ||||
| the fss_type field that is reported as part of the fs_status attribute | ||||
| (see <xref target="fs_status" format="default"/>). | ||||
| Since these file systems are presumed to be unsuitable for simultaneous use, | ||||
| there is no specification of how locking is handled; in general, locks obtained on one file | ||||
| system will be separate from those on others. | ||||
| Since these are expected to be read-only file systems, | ||||
| this is not likely to pose an issue for clients or applications. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When none of these special situations applies, there is no basis | ||||
| within the protocol for the client to make assumptions about the | ||||
| contents of a replica file system or its relationship to previous | ||||
| file system instances. Thus, switching between nominally | ||||
| identical read-write file systems would not be possible because either the | ||||
| client does not use the fs_locations_info attribute, or the server does not support it. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-EFF-lock" numbered="true" toc="default"> | ||||
| <name>Lock State and File System Transitions</name> | ||||
| <t> | ||||
| While accessing a file system, clients obtain locks enforced | ||||
| by the server, which may prevent actions by other clients | ||||
| that are inconsistent with those locks. | ||||
| </t> | ||||
| <t> | ||||
| When access is transferred between replicas, clients need to | ||||
| be assured that the actions disallowed by holding these locks | ||||
| cannot have occurred during the transition. This can be ensured | ||||
| by the methods below. Unless at least one of these is implemented, | ||||
| clients will not be assured of continuity of lock | ||||
| possession across a migration event: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| Providing the client an opportunity to re-obtain his locks via a per-fs grace | ||||
| period on the destination server, denying all clients using the | ||||
| destination file system the | ||||
| opportunity to obtain new locks that conflict with those held | ||||
| by the transferred client as long as that client | ||||
| has not completed its per-fs grace period. Because the lock reclaim | ||||
| mechanism was originally defined to support server reboot, it | ||||
| implicitly assumes that filehandles will, upon reclaim, | ||||
| be the same as those at open. In the case of migration, this | ||||
| requires that source and destination servers use the same | ||||
| filehandles, as evidenced by using the same server scope | ||||
| (see <xref target="Server_Scope" format="default"/>) | ||||
| or by showing this agreement using fs_locations_info | ||||
| (see <xref target="SEC11-EFF-fh" format="default"/> above). | ||||
| </t> | ||||
| <t> | ||||
| Note that such a grace period can be implemented without | ||||
| interfering with the ability of non-transferred clients to | ||||
| obtain new locks while it is going on. As long as the destination | ||||
| server is aware of the transferred locks, it can distinguish requests | ||||
| to obtain new locks that contrast with existing locks | ||||
| from those that do not, allowing it to treat such client requests | ||||
| without reference to the ongoing grace period. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| Locking state can be transferred as part of the transition | ||||
| by providing Transparent State Migration as | ||||
| described in <xref target="SEC11-trans-locking" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Of these, Transparent State Migration provides the smoother | ||||
| experience for clients in that there is no need to go through a | ||||
| reclaim process before new locks can be obtained; however, it requires | ||||
| a greater degree of inter-server coordination. In general, the | ||||
| servers taking part in migration are free to provide either | ||||
| facility. However, when the filehandles can differ across the | ||||
| migration event, Transparent State Migration is the only | ||||
| available means of providing the needed functionality. | ||||
| </t> | ||||
| <t> | ||||
| It should be noted that these two methods are not mutually | ||||
| exclusive and that a server might well provide both. In | ||||
| particular, if there is some circumstance preventing a | ||||
| specific lock from being transferred transparently, | ||||
| the destination server can allow it to be reclaimed by | ||||
| implementing a per-fs grace period for the migrated file system. | ||||
| </t> | ||||
| <section anchor="SEC11-EFF-lock-sc" numbered="true" toc="default"> | ||||
| <name>Security Consideration Related to Reclaiming Lock State after File System Transitions</name> | ||||
| <t> | ||||
| Although it is possible for a client reclaiming state to misrepresent | ||||
| its state in the same fashion as described in | ||||
| <xref target="reclaim_security_considerations" format="default"/>, most | ||||
| implementations providing for such reclamation in the case of | ||||
| file system transitions | ||||
| will have the ability to detect such misrepresentations. This limits | ||||
| the ability of unauthenticated clients to execute denial-of-service | ||||
| attacks in these circumstances. Nevertheless, the rules stated in | ||||
| <xref target="reclaim_security_considerations" format="default"/> regarding principal | ||||
| verification for reclaim requests apply in this situation as well. | ||||
| </t> | ||||
| <t> | ||||
| Typically, implementations that support file system transitions | ||||
| will have extensive information about the locks | ||||
| to be transferred. This is because of the following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Since failure is not involved, there is no need to store locking | ||||
| information in persistent storage. | ||||
| </li> | ||||
| <li> | ||||
| There is no need, as there is in the failure case, to update | ||||
| multiple repositories containing locking state to keep them in | ||||
| sync. Instead, there is a one-time communication of locking | ||||
| state from the source to the destination server. | ||||
| </li> | ||||
| <li> | ||||
| Providing this information avoids potential interference with | ||||
| existing clients using the destination file system by denying | ||||
| them the ability to obtain new locks during the grace period. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When such detailed locking information, not necessarily including | ||||
| the associated stateids, is available: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| It is possible to detect reclaim requests that attempt to | ||||
| reclaim locks that did not exist before the transfer, rejecting | ||||
| them with NFS4ERR_RECLAIM_BAD (<xref target="err_RECLAIM_BAD" format="default"/>). | ||||
| </li> | ||||
| <li> | ||||
| It is possible when dealing with non-reclaim requests, to determine | ||||
| whether they conflict with existing locks, eliminating the need | ||||
| to return NFS4ERR_GRACE (<xref target="err_GRACE" format="default"/>) on | ||||
| non-reclaim requests. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| It is possible for implementations of grace periods in connection | ||||
| with file system transitions not to have detailed locking | ||||
| information available at the destination server, in which case, | ||||
| the security situation is exactly as described in | ||||
| <xref target="reclaim_security_considerations" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="transferred_lease" numbered="true" toc="default"> | ||||
| <name>Leases and File System Transitions</name> | ||||
| <t> | ||||
| In the case of lease renewal, the client may not be | ||||
| submitting requests for a file system that has been transferred | ||||
| to another server. This can occur | ||||
| because of the lease renewal mechanism. The | ||||
| client renews the lease associated with all file systems | ||||
| when submitting | ||||
| a request on an associated session, regardless of the | ||||
| specific file system being referenced. | ||||
| </t> | ||||
| <t> | ||||
| In order for the client to schedule renewal of its lease | ||||
| where there is locking state that may have been relocated | ||||
| to the new server, the client | ||||
| must find out about lease relocation before that lease | ||||
| expire. To accomplish this, the SEQUENCE operation will | ||||
| return the status bit SEQ4_STATUS_LEASE_MOVED | ||||
| if responsibility for any of the renewed locking state | ||||
| has been transferred to a new server. This | ||||
| will continue until the client receives an | ||||
| NFS4ERR_MOVED error for each of the file systems for which | ||||
| there has been locking state relocation. | ||||
| </t> | ||||
| <t> | ||||
| When a client receives an SEQ4_STATUS_LEASE_MOVED indication from | ||||
| a server, for each file system of the server for which the client | ||||
| has locking state, the client should perform an operation. | ||||
| For simplicity, the client may choose to reference | ||||
| all file systems, but what is important | ||||
| is that it must reference all file systems for which there was | ||||
| locking state where that state has moved. Once the client | ||||
| receives an NFS4ERR_MOVED error for each such file system, | ||||
| the server will clear the SEQ4_STATUS_LEASE_MOVED indication. | ||||
| The client can terminate the process of checking file systems | ||||
| once this indication is cleared (but only if the client | ||||
| has received a reply for all outstanding SEQUENCE requests | ||||
| on all sessions it has with the server), since there are no others | ||||
| for which locking state has moved. | ||||
| </t> | ||||
| <t> | ||||
| A client may use GETATTR of the fs_status | ||||
| (or fs_locations_info) attribute on all of the file systems | ||||
| to get absence indications in a single (or a few) request(s), | ||||
| since absent file systems will not cause an error in this | ||||
| context. However, it still must do an operation that | ||||
| receives NFS4ERR_MOVED on each file system, in order to clear | ||||
| the SEQ4_STATUS_LEASE_MOVED indication. | ||||
| </t> | ||||
| <t> | ||||
| Once the set of file systems with transferred locking state | ||||
| has been determined, the client can follow the normal process | ||||
| to obtain the new server information (through the | ||||
| fs_locations and fs_locations_info attributes) and perform renewal | ||||
| of that lease on the new server, unless information in the | ||||
| fs_locations_info attribute shows that no state could have | ||||
| been transferred. If the server has not | ||||
| had state transferred to it transparently, the client | ||||
| will receive NFS4ERR_STALE_CLIENTID | ||||
| from the new server, | ||||
| as described above, and the client can then reclaim | ||||
| locks | ||||
| as is done in the event of server failure. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="transition_lease_time" numbered="true" toc="default"> | ||||
| <name>Transitions and the Lease_time Attribute</name> | ||||
| <t> | ||||
| In order that the client may appropriately manage its lease | ||||
| in the case of a file system transition, the destination server must | ||||
| establish proper values for the lease_time attribute. | ||||
| </t> | ||||
| <t> | ||||
| When state is transferred transparently, that state | ||||
| should include the correct value of the lease_time | ||||
| attribute. The lease_time attribute on the destination | ||||
| server must never be less than that on the source, since | ||||
| this would result in premature expiration of a lease | ||||
| granted by the source server. Upon transitions in which | ||||
| state is transferred transparently, the client is under | ||||
| no obligation to refetch the lease_time attribute and | ||||
| may continue to use the value | ||||
| previously fetched (on the source server). | ||||
| </t> | ||||
| <t> | ||||
| If state has not been transferred transparently, either | ||||
| because the associated servers are shown as having different | ||||
| eir_server_scope strings or because the client ID | ||||
| is rejected when presented to the new server, | ||||
| the client should fetch the value | ||||
| of lease_time on the new (i.e., destination) server, and | ||||
| use it for subsequent locking requests. However, the server | ||||
| must respect a grace | ||||
| period of at least as long as the lease_time on the source | ||||
| server, in order to ensure that clients have ample time to | ||||
| reclaim their lock before potentially conflicting | ||||
| non-reclaimed locks are granted. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="SEC11-trans-locking" numbered="true" toc="default"> | ||||
| <name>Transferring State upon Migration</name> | ||||
| <t> | ||||
| When the transition is a result of a server-initiated decision | ||||
| to transition access, and the source and destination servers have | ||||
| implemented appropriate cooperation, it is possible to do the following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Transfer locking state from the source to the destination | ||||
| server in a fashion similar to that provided by Transparent State | ||||
| Migration in NFSv4.0, as described in <xref target="RFC7931" format="default"/>. | ||||
| Server responsibilities are described in <xref target="SEC11-XS-lock" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| Transfer session state from the source to the destination | ||||
| server. Server responsibilities in effecting such a | ||||
| transfer are described in <xref target="SEC11-XS-session" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The means by which the client determines which of these transfer | ||||
| events has occurred are described in | ||||
| <xref target="SEC11-trans-client" format="default"/>. | ||||
| </t> | ||||
| <section anchor="V41p-pnfs" numbered="true" toc="default"> | ||||
| <name>Transparent State Migration and pNFS</name> | ||||
| <t> | ||||
| When pNFS is involved, the protocol is capable of supporting: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Migration of the Metadata Server (MDS), leaving the Data | ||||
| Servers (DSs) in place. | ||||
| </li> | ||||
| <li> | ||||
| Migration of the file system as a whole, including the MDS | ||||
| and associated DSs. | ||||
| </li> | ||||
| <li> | ||||
| Replacement of one DS by another. | ||||
| </li> | ||||
| <li> | ||||
| Migration of a pNFS file system to one in which pNFS is not used. | ||||
| </li> | ||||
| <li> | ||||
| Migration of a file system not using pNFS to one in which | ||||
| layouts are available. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Note that migration, per se, is only involved in the transfer of | ||||
| the MDS function. Although the servicing of a layout may be | ||||
| transferred from one data server to another, this not done using | ||||
| the file system location attributes. The MDS can effect such | ||||
| transfers by recalling or revoking existing layouts and granting new | ||||
| ones on a different data server. | ||||
| </t> | ||||
| <t> | ||||
| Migration of the MDS function is directly supported by | ||||
| Transparent State Migration. Layout state will normally be | ||||
| transparently transferred, just as other state is. | ||||
| As a result, Transparent State Migration provides a framework in | ||||
| which, given appropriate inter-MDS data transfer, one MDS can | ||||
| be substituted for another. | ||||
| </t> | ||||
| <t> | ||||
| Migration of the file system function as a whole can be accomplished by | ||||
| recalling all layouts as part of the initial phase of the | ||||
| migration process. As a result, I/O will be done through the | ||||
| MDS during the migration process, and new layouts can be granted | ||||
| once the client is interacting with the new MDS. An MDS can | ||||
| also effect this sort of transition by revoking all layouts | ||||
| as part of Transparent State Migration, as long as the client is | ||||
| notified about the loss of locking state. | ||||
| </t> | ||||
| <t> | ||||
| In order to allow migration to a file system on which pNFS is | ||||
| not supported, clients need to be prepared for a situation in | ||||
| which layouts are not available or supported on the destination file | ||||
| system and so direct I/O requests to the destination | ||||
| server, rather than depending on layouts being available. | ||||
| </t> | ||||
| <t> | ||||
| Replacement of one DS by another is not addressed by migration as | ||||
| such but can be effected by an MDS recalling layouts for the DS | ||||
| to be replaced and issuing new ones to be served by the | ||||
| successor DS. | ||||
| </t> | ||||
| <t> | ||||
| Migration may transfer a file system from a server that does | ||||
| not support pNFS to one that does. In order to properly adapt | ||||
| to this situation, clients that support pNFS, but function | ||||
| adequately in its absence, should check for pNFS support when | ||||
| a file system is migrated and be prepared to use pNFS when | ||||
| support is available on the destination. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="SEC11-trans-client" numbered="true" toc="default"> | ||||
| <name>Client Responsibilities When Access Is Transitioned</name> | ||||
| <t> | ||||
| For a client to respond to an access transition, it must become | ||||
| aware of it. The ways in which this can happen are discussed | ||||
| in <xref target="V41c-clrecov" format="default"/>, which discusses indications | ||||
| that a specific file system access path has transitioned as well as | ||||
| situations in which additional activity is necessary to | ||||
| determine the set of file systems that have been migrated. | ||||
| <xref target="V41c-migrdisc" format="default"/> goes on to complete the discussion | ||||
| of how the set of migrated file systems might be determined. | ||||
| Sections <xref target="V41c-omoved" format="counter"/> through | ||||
| <xref target="V41c-ssnwas" format="counter"/> | ||||
| discuss how the client should deal with | ||||
| each transition it becomes aware of, either directly or as a | ||||
| result of migration discovery. | ||||
| </t> | ||||
| <t> | ||||
| The following terms are used to describe client activities: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| "Transition recovery" refers to the process of restoring access | ||||
| to a file system on which NFS4ERR_MOVED was received. | ||||
| </li> | ||||
| <li> | ||||
| "Migration recovery" refers to that subset of transition recovery | ||||
| that applies when the file system has migrated to a different | ||||
| replica. | ||||
| </li> | ||||
| <li> | ||||
| "Migration discovery" refers to the process of determining which | ||||
| file system(s) have been migrated. It is necessary to avoid a situation in | ||||
| which leases could expire when a file system is not accessed for | ||||
| a long period of time, since a client unaware of the migration | ||||
| might be referencing an unmigrated file system and not renewing | ||||
| the lease associated with the migrated file system. | ||||
| </li> | ||||
| </ul> | ||||
| <section anchor="V41c-clrecov" numbered="true" toc="default"> | ||||
| <name>Client Transition Notifications</name> | ||||
| <t> | ||||
| When there is a change in the network access | ||||
| path that a client is to use to access a file system, there | ||||
| are a number of related status indications with which clients | ||||
| need to deal: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| If an attempt is made to use or return a filehandle | ||||
| within a file system that is no longer accessible at the | ||||
| address previously used to access it, the | ||||
| error NFS4ERR_MOVED is returned. | ||||
| </t> | ||||
| <t> | ||||
| Exceptions are made to allow such filehandles to be used | ||||
| when interrogating a file system location attribute. | ||||
| This enables a client to determine | ||||
| a new replica's location or a new network access path. | ||||
| </t> | ||||
| <t> | ||||
| This condition continues on subsequent attempts to access | ||||
| the file system in question. The only way the client | ||||
| can avoid the error is to cease accessing the file system in | ||||
| question at its old server location and access it instead | ||||
| using a different address at which it is now available. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Whenever a client sends a SEQUENCE operation to a server that | ||||
| generated state held on that client and associated with a | ||||
| file system no longer accessible on that server, the response will contain | ||||
| the status bit SEQ4_STATUS_LEASE_MOVED, indicating that there has | ||||
| been a lease migration. | ||||
| </t> | ||||
| <t> | ||||
| This condition continues until the client acknowledges | ||||
| the notification by fetching a file system location attribute for the | ||||
| file system whose network access path is being changed. | ||||
| When there are multiple such file systems, a location attribute | ||||
| for each such file system needs to be fetched. The location | ||||
| attribute for all migrated file systems needs to be fetched | ||||
| in order to clear the condition. Even after the condition is cleared, the | ||||
| client needs to respond by using the location information | ||||
| to access the file system at its new location | ||||
| to ensure that leases are not needlessly expired. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Unlike NFSv4.0, in which the corresponding | ||||
| conditions are both errors and thus mutually exclusive, | ||||
| in NFSv4.1 the client can, | ||||
| and often will, receive both indications on the same | ||||
| request. As a result, implementations need to address the | ||||
| question of how to coordinate | ||||
| the necessary recovery actions when both indications | ||||
| arrive in the response to the same request. It should be noted | ||||
| that when processing an NFSv4 COMPOUND, the server | ||||
| will normally decide | ||||
| whether SEQ4_STATUS_LEASE_MOVED is to be set before | ||||
| it determines which file system will be referenced or whether | ||||
| NFS4ERR_MOVED is to be returned. | ||||
| </t> | ||||
| <t> | ||||
| Since these indications are not mutually exclusive in NFSv4.1, | ||||
| the following combinations are possible results when a COMPOUND | ||||
| is issued: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| The COMPOUND status | ||||
| is NFS4ERR_MOVED, and SEQ4_STATUS_LEASE_MOVED is asserted. | ||||
| </t> | ||||
| <t> | ||||
| In this case, transition recovery is required. While it is | ||||
| possible that migration discovery is needed in addition, it | ||||
| is likely that only the accessed file system has transitioned. | ||||
| In any case, because addressing NFS4ERR_MOVED is necessary to | ||||
| allow the rejected requests to be processed on the target, | ||||
| dealing with it will typically have priority over | ||||
| migration discovery. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The COMPOUND status | ||||
| is NFS4ERR_MOVED, and SEQ4_STATUS_LEASE_MOVED is clear. | ||||
| </t> | ||||
| <t> | ||||
| In this case, transition recovery is also required. It is | ||||
| clear that migration discovery is not needed to find | ||||
| file systems that have been migrated other than the one | ||||
| returning NFS4ERR_MOVED. Cases in which this | ||||
| result can arise include a referral or a migration for which | ||||
| there is no associated locking state. This can also arise in | ||||
| cases in which an access path transition | ||||
| other than migration occurs within the same server. In such a | ||||
| case, there is no need to set SEQ4_STATUS_LEASE_MOVED, since | ||||
| the lease remains associated with the current server even though | ||||
| the access path has changed. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The COMPOUND status | ||||
| is not NFS4ERR_MOVED, and SEQ4_STATUS_LEASE_MOVED is asserted. | ||||
| </t> | ||||
| <t> | ||||
| In this case, no transition recovery activity is required on | ||||
| the file system(s) accessed by the request. However, to prevent avoidable | ||||
| lease expiration, migration discovery needs to be done. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The COMPOUND status | ||||
| is not NFS4ERR_MOVED, and SEQ4_STATUS_LEASE_MOVED is clear. | ||||
| </t> | ||||
| <t> | ||||
| In this case, neither transition-related activity nor migration | ||||
| discovery is required. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Note that the specified actions only need to be taken if they are | ||||
| not already going on. For example, when NFS4ERR_MOVED is received | ||||
| while accessing a file system for which transition recovery is already occurring, the client | ||||
| merely waits for that recovery to be completed, while the receipt of | ||||
| the SEQ4_STATUS_LEASE_MOVED indication only | ||||
| needs to initiate migration discovery for a server if such | ||||
| discovery is not already underway for that server. | ||||
| </t> | ||||
| <t> | ||||
| The fact that a lease-migrated condition does not result in | ||||
| an error in NFSv4.1 has a number of important consequences. | ||||
| In addition to the fact that the two | ||||
| indications are not mutually exclusive, as discussed above, there are number of | ||||
| issues that are important in considering implementation of | ||||
| migration discovery, as discussed in | ||||
| <xref target="V41c-migrdisc" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| Because SEQ4_STATUS_LEASE_MOVED is not an error condition, it is possible | ||||
| for file systems whose access paths have not changed to be | ||||
| successfully accessed on a given server even though recovery | ||||
| is necessary for other file systems on the same server. As | ||||
| a result, access can take place while: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The migration discovery process is happening for that server. | ||||
| </li> | ||||
| <li> | ||||
| The transition recovery process is happening for other | ||||
| file systems connected to that server. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="V41c-migrdisc" numbered="true" toc="default"> | ||||
| <name>Performing Migration Discovery</name> | ||||
| <t> | ||||
| Migration discovery can be performed in the same context as | ||||
| transition recovery, allowing recovery for each migrated file | ||||
| system to be invoked as it is discovered. Alternatively, it may | ||||
| be done in a separate migration discovery thread, allowing | ||||
| migration discovery to be done in parallel with | ||||
| one or more instances of transition recovery. | ||||
| </t> | ||||
| <t> | ||||
| In either case, because the lease-migrated indication | ||||
| does not result in an error, other access to file systems on the | ||||
| server can proceed normally, with the possibility that further | ||||
| such indications will be received, raising the issue of how | ||||
| such indications are to be dealt with. In general: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| No action needs to be taken for such indications received by any | ||||
| threads performing migration discovery, since continuation of that | ||||
| work will address the issue. | ||||
| </li> | ||||
| <li> | ||||
| In other cases in which migration discovery is currently being performed, | ||||
| nothing further needs to be done to respond to such lease | ||||
| migration indications, as long as one can be certain that the migration | ||||
| discovery process would deal with those indications. See below for details. | ||||
| </li> | ||||
| <li> | ||||
| For such indications received in all other contexts, the | ||||
| appropriate response is to initiate or otherwise provide for the | ||||
| execution of migration discovery for file systems | ||||
| associated with the server IP address returning the indication. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| This leaves a potential difficulty in situations in which the | ||||
| migration discovery process is near to completion but is still | ||||
| operating. One should not ignore a SEQ4_STATUS_LEASE_MOVED indication if | ||||
| the migration discovery process is not able to respond to | ||||
| the discovery of additional migrating file | ||||
| systems without additional aid. A further complexity relevant in | ||||
| addressing such situations is that a lease-migrated indication may | ||||
| reflect the server's state at the time the SEQUENCE operation | ||||
| was processed, which may be different from that in effect at the | ||||
| time the response is received. Because new migration events | ||||
| may occur at any time, and because a SEQ4_STATUS_LEASE_MOVED indication may reflect | ||||
| the situation in effect a considerable time before the indication | ||||
| is received, special care needs to be taken to ensure that SEQ4_STATUS_LEASE_MOVED | ||||
| indications are not inappropriately ignored. | ||||
| </t> | ||||
| <t> | ||||
| A useful approach to this issue involves the use of separate | ||||
| externally-visible migration discovery states for each server. | ||||
| Separate values could represent the various possible states for | ||||
| the migration discovery process for a server: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Non-operation, in which migration discovery is not being | ||||
| performed. | ||||
| </li> | ||||
| <li> | ||||
| Normal operation, in which there is an ongoing scan for | ||||
| migrated file systems. | ||||
| </li> | ||||
| <li> | ||||
| Completion/verification of migration discovery processing, | ||||
| in which the possible completion of migration discovery | ||||
| processing needs to be verified. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Given that framework, migration discovery processing would proceed | ||||
| as follows: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| While in the normal-operation state, the thread performing | ||||
| discovery would fetch, for | ||||
| successive file systems known to the client on the server being | ||||
| worked on, a file system location attribute plus the fs_status attribute. | ||||
| </li> | ||||
| <li> | ||||
| If the fs_status attribute indicates that the file system | ||||
| is a migrated one (i.e., fss_absent is true, and | ||||
| fss_type != STATUS4_REFERRAL), then a migrated file system has | ||||
| been found. In this situation, it is likely | ||||
| that the fetch of the file system location attribute has | ||||
| cleared one of the file systems contributing to the | ||||
| lease-migrated indication. | ||||
| </li> | ||||
| <li> | ||||
| In cases in which that happened, the thread cannot know whether | ||||
| the lease-migrated indication has been cleared, and so it enters the | ||||
| completion/verification state and proceeds to issue a COMPOUND | ||||
| to see if the SEQ4_STATUS_LEASE_MOVED indication has been cleared. | ||||
| </li> | ||||
| <li> | ||||
| When the discovery process is in the completion/verification state, | ||||
| if other requests get a lease-migrated indication, | ||||
| they note that it was received. Later, the existence of such | ||||
| indications is used when the request completes, as described below. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When the request used in the completion/verification state completes: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If a lease-migrated indication is returned, the discovery | ||||
| continues normally. Note that this is so even if all file systems | ||||
| have been traversed, since new migrations could have occurred | ||||
| while the process was going on. | ||||
| </li> | ||||
| <li> | ||||
| Otherwise, if there is any record that other requests saw a | ||||
| lease-migrated indication while the request was occurring, | ||||
| that record is cleared, and the verification request is retried. The discovery | ||||
| process remains in the completion/verification state. | ||||
| </li> | ||||
| <li> | ||||
| If there have been no lease-migrated indications, the work of | ||||
| migration discovery is considered completed, and it enters the | ||||
| non-operating state. Once it enters this state, subsequent | ||||
| lease-migrated indications will trigger a new migration discovery | ||||
| process. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| It should be noted that the process described above is not | ||||
| guaranteed to terminate, as a long series of new migration | ||||
| events might continually delay the clearing of the SEQ4_STATUS_LEASE_MOVED | ||||
| indication. To prevent unnecessary lease expiration, it is | ||||
| appropriate for clients | ||||
| to use the discovery of migrations to effect lease | ||||
| renewal immediately, rather than waiting for the clearing of the | ||||
| SEQ4_STATUS_LEASE_MOVED indication when the complete set of migrations is | ||||
| available. | ||||
| </t> | ||||
| <t> | ||||
| Lease discovery needs to be provided as described above. This | ||||
| ensures that the client discovers file system migrations soon | ||||
| enough to renew its leases on each destination server before they | ||||
| expire. Non-renewal of leases can lead to loss of locking state. | ||||
| While the consequences of such | ||||
| loss can be ameliorated through implementations of courtesy locks, | ||||
| servers are under no obligation to do so, and a conflicting lock request | ||||
| may mean that a lock is revoked unexpectedly. Clients should be aware | ||||
| of this possibility. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="V41c-omoved" numbered="true" toc="default"> | ||||
| <name>Overview of Client Response to NFS4ERR_MOVED</name> | ||||
| <t> | ||||
| This section outlines a way in which a client that receives | ||||
| NFS4ERR_MOVED can effect transition recovery by using a new | ||||
| server or server endpoint | ||||
| if one is available. As part of that process, it will | ||||
| determine: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Whether the NFS4ERR_MOVED indicates migration has occurred, | ||||
| or whether it indicates another sort of file system | ||||
| access transition as discussed | ||||
| in <xref target="SEC11-nwa" format="default"/> above. | ||||
| </li> | ||||
| <li> | ||||
| In the case of migration, whether Transparent State | ||||
| Migration has occurred. | ||||
| </li> | ||||
| <li> | ||||
| Whether any state has been lost during the process of | ||||
| Transparent State Migration. | ||||
| </li> | ||||
| <li> | ||||
| Whether sessions have been transferred as part of Transparent | ||||
| State Migration. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| During the first phase of this process, the client proceeds to | ||||
| examine file system location entries to find the initial | ||||
| network address | ||||
| it will use to continue access | ||||
| to the file system or its replacement. | ||||
| For each location entry that the client examines, the process | ||||
| consists of five steps: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| Performing an EXCHANGE_ID | ||||
| directed at the location address. This operation is used to | ||||
| register the client owner (in the form of a client_owner4) | ||||
| with the server, to obtain a client ID | ||||
| to be used subsequently to communicate with it, to obtain that | ||||
| client ID's confirmation status, and to determine server_owner4 | ||||
| and scope for the purpose of determining if the entry | ||||
| is trunkable with the address | ||||
| previously being used to access the file system (i.e., that | ||||
| it represents another network access path to the same | ||||
| file system and can share locking state with it). | ||||
| </li> | ||||
| <li> | ||||
| Making an initial determination of whether migration has | ||||
| occurred. The initial determination will be based | ||||
| on whether the EXCHANGE_ID results indicate that the | ||||
| current location element is server-trunkable with that | ||||
| used to access the file system when access | ||||
| was terminated by receiving NFS4ERR_MOVED. | ||||
| If it is, then migration has not occurred. In that case, the | ||||
| transition is | ||||
| dealt with, at least initially, as one involving continued | ||||
| access to the same file system on the same server through | ||||
| a new network address. | ||||
| </li> | ||||
| <li> | ||||
| Obtaining access to existing session state or creating new | ||||
| sessions. How this is done depends on the initial | ||||
| determination of whether migration has occurred and | ||||
| can be done as described in <xref target="V41c-ssmig" format="default"/> below | ||||
| in the case of migration or as described in | ||||
| <xref target="V41c-ssnwas" format="default"/> below | ||||
| in the case of a network address transfer without migration. | ||||
| </li> | ||||
| <li> | ||||
| Verifying the trunking relationship assumed in step | ||||
| 2 as discussed in <xref target="PREP-trunk-verify" format="default"/>. | ||||
| Although this step will generally confirm the initial | ||||
| determination, it is possible for verification to invalidate | ||||
| the initial determination of network address shift (without | ||||
| migration) and instead determine that migration had occurred. | ||||
| There is no need to redo | ||||
| step 3 above, since it will be possible to continue use of the | ||||
| session established already. | ||||
| </li> | ||||
| <li> | ||||
| Obtaining access to existing locking state and/or | ||||
| re-obtaining it. How this is done depends on the final | ||||
| determination of whether migration has occurred and | ||||
| can be done as described below in <xref target="V41c-ssmig" format="default"/> | ||||
| in the case of migration or as described in | ||||
| <xref target="V41c-ssnwas" format="default"/> | ||||
| in the case of a network address transfer without migration. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| Once the initial address has been determined, clients are free | ||||
| to apply an abbreviated process to find additional addresses | ||||
| trunkable with it (clients may seek session-trunkable or | ||||
| server-trunkable addresses depending on whether they support | ||||
| client ID trunking). During this later phase of the process, | ||||
| further location entries are examined using the abbreviated | ||||
| procedure specified below: | ||||
| </t> | ||||
| <ol spacing="normal" type="%C:"> | ||||
| <li> | ||||
| Before the EXCHANGE_ID, the fs name of the location | ||||
| entry is examined, and if it | ||||
| does not match that currently being used, the entry is ignored. | ||||
| Otherwise, one proceeds as specified by step 1 above. | ||||
| </li> | ||||
| <li> | ||||
| In the case that the network address is session-trunkable with one | ||||
| used previously, a BIND_CONN_TO_SESSION is used to access that | ||||
| session using the new network address. Otherwise, or if the bind | ||||
| operation fails, a CREATE_SESSION is done. | ||||
| </li> | ||||
| <li> | ||||
| The verification procedure referred to in step 4 above is | ||||
| used. However, if it fails, the entry is ignored and the next | ||||
| available entry is used. | ||||
| </li> | ||||
| </ol> | ||||
| </section> | ||||
| <section anchor="V41c-ssmig" numbered="true" toc="default"> | ||||
| <name>Obtaining Access to Sessions and State after Migration</name> | ||||
| <t> | ||||
| In the event that migration has occurred, migration recovery | ||||
| will involve determining whether Transparent State Migration has | ||||
| occurred. This decision is made based on the client ID returned | ||||
| by the EXCHANGE_ID and the reported confirmation status. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the client ID is an unconfirmed client ID not previously known | ||||
| to the client, then Transparent State Migration has not occurred. | ||||
| </li> | ||||
| <li> | ||||
| If the client ID is a confirmed client ID previously known | ||||
| to the client, then any transferred state would have been | ||||
| merged with an existing client ID representing the client to the | ||||
| destination server. In this state merger case, Transparent | ||||
| State Migration might | ||||
| or might not have occurred, and a determination as to whether | ||||
| it has occurred is deferred until sessions are established | ||||
| and the client is ready to begin state recovery. | ||||
| </li> | ||||
| <li> | ||||
| If the client ID is a confirmed client ID not previously known | ||||
| to the client, then the client can conclude that the | ||||
| client ID was transferred as part of Transparent State Migration. | ||||
| In this transferred client ID case, Transparent State Migration | ||||
| has occurred, although some state might have been lost. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Once the client ID has been obtained, it is necessary to | ||||
| obtain access to sessions to continue communication with the | ||||
| new server. | ||||
| In any of the cases in which Transparent State Migration | ||||
| has occurred, it is possible that a session was transferred | ||||
| as well. To deal with that possibility, clients can, after | ||||
| doing the EXCHANGE_ID, issue a BIND_CONN_TO_SESSION to | ||||
| connect the transferred session to a connection to the new | ||||
| server. If that fails, it is an indication that the session | ||||
| was not transferred and that a new session needs to be created to | ||||
| take its place. | ||||
| </t> | ||||
| <t> | ||||
| In some situations, it is possible for a BIND_CONN_TO_SESSION | ||||
| to succeed without session migration having occurred. If | ||||
| state merger has taken place, then the associated client ID | ||||
| may have already had a set of existing sessions, with it | ||||
| being possible that the session ID of a given session is the | ||||
| same as one that might have been migrated. In that event, | ||||
| a BIND_CONN_TO_SESSION might succeed, even though there | ||||
| could have been no migration of the session with that session ID. | ||||
| In such cases, the client will receive sequence errors when the | ||||
| slot sequence values used are not appropriate on the new | ||||
| session. When this occurs, the client can create a new a | ||||
| session and cease using the existing one. | ||||
| </t> | ||||
| <t> | ||||
| Once the client has determined the initial migration status, | ||||
| and determined that there was a shift to a new server, it | ||||
| needs to re-establish its locking state, if possible. To enable | ||||
| this to happen without loss of the guarantees normally provided by | ||||
| locking, the destination server needs to implement a per-fs grace | ||||
| period in all cases in which lock state was lost, including | ||||
| those in which Transparent State Migration was not | ||||
| implemented. Each client for which there was a transfer of locking | ||||
| state to the new server will have the duration of the grace period | ||||
| to reclaim its locks, from the time its locks were transferred. | ||||
| </t> | ||||
| <t> | ||||
| Clients need to deal with the following cases: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| In the state merger case, it is possible that the server | ||||
| has not attempted Transparent State Migration, | ||||
| in which case state may have been | ||||
| lost without it being reflected in the SEQ4_STATUS bits. | ||||
| To determine whether this has happened, the client can use | ||||
| TEST_STATEID to check whether the stateids created on the | ||||
| source server are still accessible on the destination server. | ||||
| Once a single stateid is found to have been successfully | ||||
| transferred, the client can conclude that Transparent State | ||||
| Migration was begun, and any failure to transport all of the | ||||
| stateids will be reflected in the SEQ4_STATUS bits. Otherwise, | ||||
| Transparent State Migration has not occurred. | ||||
| </li> | ||||
| <li> | ||||
| In a case in which Transparent State Migration has not | ||||
| occurred, the client can use the per-fs grace period provided | ||||
| by the destination server to reclaim locks that were held on | ||||
| the source server. | ||||
| </li> | ||||
| <li> | ||||
| In a case in which Transparent State Migration has | ||||
| occurred, and no lock state was lost (as shown by SEQ4_STATUS | ||||
| flags), no lock reclaim is necessary. | ||||
| </li> | ||||
| <li> | ||||
| In a case in which Transparent State Migration has | ||||
| occurred, and some lock state was lost (as shown by SEQ4_STATUS | ||||
| flags), existing stateids need to be checked for validity | ||||
| using TEST_STATEID, and reclaim used to re-establish any that | ||||
| were not transferred. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs | ||||
| value of TRUE needs to be done before | ||||
| normal use of the file system, including obtaining new locks for the | ||||
| file system. This applies even if no locks were lost and there | ||||
| was no need for any to be reclaimed. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="V41c-ssnwas" numbered="true" toc="default"> | ||||
| <name>Obtaining Access to Sessions and State after Network Address Transfer</name> | ||||
| <t> | ||||
| The case in which there is a transfer to a new network | ||||
| address without migration is similar to that described | ||||
| in <xref target="V41c-ssmig" format="default"/> above in that there is a need to | ||||
| obtain access to needed sessions and locking state. However, | ||||
| the details are simpler and will vary depending on the | ||||
| type of trunking between the address receiving | ||||
| NFS4ERR_MOVED and that to which the transfer is to be made. | ||||
| </t> | ||||
| <t> | ||||
| To make a session available for use, a BIND_CONN_TO_SESSION | ||||
| should be used to obtain access to the session previously | ||||
| in use. Only if this fails, should a CREATE_SESSION be done. | ||||
| While this procedure mirrors that in <xref target="V41c-ssmig" format="default"/> | ||||
| above, | ||||
| there is an important difference in that preservation of the | ||||
| session is not purely optional but depends on the type of | ||||
| trunking. | ||||
| </t> | ||||
| <t> | ||||
| Access to appropriate locking state will generally need no actions | ||||
| beyond access to the session. However, the SEQ4_STATUS bits need to be | ||||
| checked for lost locking state, including the need to reclaim | ||||
| locks after a server reboot, since there is always a possibility | ||||
| of locking state being lost. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="SEC11-trans-server" numbered="true" toc="default"> | ||||
| <name>Server Responsibilities Upon Migration</name> | ||||
| <t> | ||||
| In the event of file system migration, when the client connects | ||||
| to the destination server, that server needs to be able to provide the | ||||
| client continued access to the files it had open on the source server. | ||||
| There are two ways to provide this: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| By provision of an fs-specific grace period, allowing the client the | ||||
| ability to reclaim its locks, in a fashion similar to what would | ||||
| have been done in the case of recovery from a server restart. See | ||||
| <xref target="SEC11-XS-reclaim" format="default"/> for a more complete | ||||
| discussion. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| By implementing Transparent State Migration possibly in | ||||
| connection with session migration, the server can provide | ||||
| the client immediate access to the state built up on the | ||||
| source server on the destination server. | ||||
| </t> | ||||
| <t> | ||||
| These features are discussed separately in Sections | ||||
| <xref target="SEC11-XS-lock" format="counter"/> and | ||||
| <xref target="SEC11-XS-session" format="counter"/>, | ||||
| which discuss Transparent State Migration and session | ||||
| migration, respectively. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| All the features described above can involve transfer of | ||||
| lock-related information between source and destination | ||||
| servers. In some cases, this transfer is a necessary part | ||||
| of the implementation, while in other cases, it is a helpful | ||||
| implementation aid, which servers might or might not use. | ||||
| The subsections below discuss the information that would be | ||||
| transferred but do not define the specifics of the transfer | ||||
| protocol. This is left as an implementation choice, although | ||||
| standards in this area could be developed at a later time. | ||||
| </t> | ||||
| <section anchor="SEC11-XS-reclaim" numbered="true" toc="default"> | ||||
| <name>Server Responsibilities in Effecting State Reclaim after Migration</name> | ||||
| <t> | ||||
| In this case, the destination server needs no knowledge of | ||||
| the locks held | ||||
| on the source server. It relies on the clients to accurately report | ||||
| (via reclaim operations) the locks previously held, and does not allow | ||||
| new locks to be granted on migrated file systems until the grace | ||||
| period expires. Disallowing of new locks applies to | ||||
| all clients accessing these file systems, while grace period | ||||
| expiration occurs for each migrated client independently. | ||||
| </t> | ||||
| <t> | ||||
| During this grace period, clients have the opportunity to use | ||||
| reclaim operations to obtain locks for file system objects within | ||||
| the migrated file system, in the same way that they do when | ||||
| recovering from server restart, and the servers typically | ||||
| rely on clients to accurately report their locks, although they | ||||
| have the option of subjecting these requests to verification. | ||||
| If the clients only reclaim locks held on the source server, no | ||||
| conflict can arise. Once the client has reclaimed its locks, | ||||
| it indicates the completion of lock reclamation by performing a | ||||
| RECLAIM_COMPLETE specifying rca_one_fs as TRUE. | ||||
| </t> | ||||
| <t> | ||||
| While it is not necessary for source and destination servers | ||||
| to cooperate to transfer information about locks, implementations | ||||
| are well advised to consider transferring the following | ||||
| useful information: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If information about the set of clients that have | ||||
| locking state for the transferred file system is made available, | ||||
| the destination | ||||
| server will be able to terminate the grace period once all | ||||
| such clients have reclaimed their locks, allowing normal | ||||
| locking activity to resume earlier than it would have otherwise. | ||||
| </li> | ||||
| <li> | ||||
| Locking summary information for individual clients (at various | ||||
| possible levels of detail) can detect | ||||
| some instances in which clients do not accurately represent the | ||||
| locks held on the source server. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="SEC11-XS-lock" numbered="true" toc="default"> | ||||
| <name>Server Responsibilities in Effecting Transparent State Migration</name> | ||||
| <t> | ||||
| The basic responsibility of the source server in effecting | ||||
| Transparent State Migration is to make available to the | ||||
| destination server a description of each piece of locking state | ||||
| associated with the file system being migrated. In addition to | ||||
| client id string and verifier, the source server needs to provide | ||||
| for each stateid: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The stateid including the current sequence value. | ||||
| </li> | ||||
| <li> | ||||
| The associated client ID. | ||||
| </li> | ||||
| <li> | ||||
| The handle of the associated file. | ||||
| </li> | ||||
| <li> | ||||
| The type of the lock, such as open, byte-range lock, delegation, | ||||
| or layout. | ||||
| </li> | ||||
| <li> | ||||
| For locks such as opens and byte-range locks, there will be | ||||
| information about the owner(s) of the lock. | ||||
| </li> | ||||
| <li> | ||||
| For recallable/revocable lock types, the current recall status | ||||
| needs to be included. | ||||
| </li> | ||||
| <li> | ||||
| For each lock type, there will be associated type-specific | ||||
| information. For opens, this will include share and deny mode | ||||
| while for byte-range locks and layouts, there will be a type and | ||||
| a byte-range. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Such information will most probably be organized by client id string | ||||
| on the destination server | ||||
| so that it can be used to provide appropriate context to each client | ||||
| when it makes itself known to the client. Issues connected with a | ||||
| client impersonating another by presenting another client's client | ||||
| id string can be addressed using NFSv4.1 state protection features, | ||||
| as described in <xref target="SECCON" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| A further server responsibility concerns locks that are revoked | ||||
| or otherwise lost during the process of file system migration. | ||||
| Because locks that appear to be lost during the process of | ||||
| migration will be reclaimed by the client, the servers have to | ||||
| take steps to ensure that locks revoked soon before or soon | ||||
| after migration are not inadvertently allowed to be reclaimed | ||||
| in situations in which the continuity of lock possession | ||||
| cannot be assured. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| For locks lost on the source but whose loss has not yet been | ||||
| acknowledged by the client (by using FREE_STATEID), the | ||||
| destination must be aware of this loss so that it can deny | ||||
| a request to reclaim them. | ||||
| </li> | ||||
| <li> | ||||
| For locks lost on the destination after the state transfer | ||||
| but before the client's RECLAIM_COMPLETE is done, the | ||||
| destination server should note these and not allow them to | ||||
| be reclaimed. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| An additional responsibility of the cooperating | ||||
| servers concerns situations | ||||
| in which a stateid cannot be transferred transparently because it | ||||
| conflicts with an existing stateid held by the client and | ||||
| associated with a different file system. In this case, there | ||||
| are two valid choices: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Treat the transfer, as in NFSv4.0, as one without Transparent | ||||
| State Migration. In this case, conflicting locks cannot be | ||||
| granted until the client does a RECLAIM_COMPLETE, after | ||||
| reclaiming the locks it had, with the exception of reclaims | ||||
| denied because they were attempts to reclaim locks that had | ||||
| been lost. | ||||
| </li> | ||||
| <li> | ||||
| Implement Transparent State Migration, except for the lock | ||||
| with the conflicting stateid. In this case, the client will | ||||
| be aware of a lost lock (through the SEQ4_STATUS flags) and be | ||||
| allowed to reclaim it. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When transferring state between the source and destination, the | ||||
| issues discussed in <xref target="RFC7931" sectionFormat="of" section="7.2"/> | ||||
| must still be attended to. In this case, the use of NFS4ERR_DELAY may still be | ||||
| necessary in NFSv4.1, as it was in NFSv4.0, to prevent locking | ||||
| state changing while it is being transferred. See | ||||
| <xref target="err_DELAY" format="default"/> for information about | ||||
| appropriate client retry approaches in the event that NFS4ERR_DELAY | ||||
| is returned. | ||||
| </t> | ||||
| <t> | ||||
| There are a number of important differences in the NFS4.1 | ||||
| context: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The absence of RELEASE_LOCKOWNER means that the one case | ||||
| in which an operation could not be deferred by use of | ||||
| NFS4ERR_DELAY no longer exists. | ||||
| </li> | ||||
| <li> | ||||
| Sequencing of operations is no longer done using owner-based | ||||
| operation sequences numbers. Instead, sequencing is session- | ||||
| based. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| As a result, when sessions are not transferred, the techniques | ||||
| discussed in <xref target="RFC7931" sectionFormat="of" section="7.2"/> | ||||
| are adequate and will not be further discussed. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-XS-session" numbered="true" toc="default"> | ||||
| <name>Server Responsibilities in Effecting Session Transfer</name> | ||||
| <t> | ||||
| The basic responsibility of the source server in effecting | ||||
| session transfer is to make available to the | ||||
| destination server a description of the current state of each | ||||
| slot with the session, including the following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The last sequence value received for that slot. | ||||
| </li> | ||||
| <li> | ||||
| Whether there is cached reply data for the last request | ||||
| executed and, if so, the cached reply. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When sessions are transferred, there are a number of issues that | ||||
| pose challenges in terms of making the transferred state | ||||
| unmodifiable during the period it is gathered up and | ||||
| transferred to the destination server: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| A single session may be used to access multiple file systems, | ||||
| not all of which are being transferred. | ||||
| </li> | ||||
| <li> | ||||
| Requests made on a session may, even if rejected, affect | ||||
| the state of the session by advancing the sequence number | ||||
| associated with the slot used. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| As a result, when the file system state might otherwise be | ||||
| considered unmodifiable, the client might have any number of | ||||
| in-flight requests, each of which is capable of changing session | ||||
| state, which may be of a number of types: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| Those requests that were processed on the migrating file system | ||||
| before migration began. | ||||
| </li> | ||||
| <li> | ||||
| Those requests that received the error NFS4ERR_DELAY because the | ||||
| file system being accessed was in the process of being | ||||
| migrated. | ||||
| </li> | ||||
| <li> | ||||
| Those requests that received the error NFS4ERR_MOVED because the | ||||
| file system being accessed had been migrated. | ||||
| </li> | ||||
| <li> | ||||
| Those requests that accessed the migrating file system | ||||
| in order to obtain location or status information. | ||||
| </li> | ||||
| <li> | ||||
| Those requests that did not reference the migrating file system. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| It should be noted that the history of any particular slot is likely | ||||
| to include a number of these request classes. In the case in which | ||||
| a session that is migrated is used by file systems other than the | ||||
| one migrated, requests of class 5 may be common and may be the last | ||||
| request processed for many slots. | ||||
| </t> | ||||
| <t> | ||||
| Since session state can change even after the locking | ||||
| state has been fixed as part of the migration process, | ||||
| the session state known to the client could be different from that on | ||||
| the destination server, which necessarily reflects the session | ||||
| state on the source server at an earlier time. | ||||
| In deciding how to deal with this situation, it is helpful to | ||||
| distinguish between two sorts of behavioral consequences of | ||||
| the choice of initial sequence ID values: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| The error NFS4ERR_SEQ_MISORDERED is returned when the sequence ID | ||||
| in a request is neither equal to the last one seen for the | ||||
| current slot nor the next greater one. | ||||
| </t> | ||||
| <t> | ||||
| In view of the difficulty of arriving at a mutually acceptable | ||||
| value for the correct last sequence value at the point of migration, | ||||
| it may be necessary for the server to show some degree of | ||||
| forbearance when the sequence ID is one that would be | ||||
| considered unacceptable if session migration were not | ||||
| involved. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Returning the cached reply for a previously executed | ||||
| request when the sequence ID | ||||
| in the request matches the last value recorded for the slot. | ||||
| </t> | ||||
| <t> | ||||
| In the cases in which an error is returned and there is no | ||||
| possibility of any non-idempotent operation having been executed, | ||||
| it may not be necessary to adhere to this as strictly as might | ||||
| be proper if session migration were not involved. For example, | ||||
| the fact that the error NFS4ERR_DELAY | ||||
| was returned may not assist the client in any material way, while | ||||
| the fact that NFS4ERR_MOVED was returned by the source server | ||||
| may not be relevant when the request was reissued and directed | ||||
| to the destination server. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| An important issue is that the specification needs to take note of | ||||
| all potential COMPOUNDs, even if they might be unlikely | ||||
| in practice. For example, a COMPOUND is allowed to access | ||||
| multiple file systems and might perform non-idempotent operations | ||||
| in some of them before accessing a file system being migrated. | ||||
| Also, a COMPOUND may return considerable data in the response | ||||
| before being rejected with NFS4ERR_DELAY or NFS4ERR_MOVED, and may | ||||
| in addition be marked as sa_cachethis. However, note that | ||||
| if the client and server adhere to rules in | ||||
| <xref target="err_DELAY" format="default"/>, there is no possibility of | ||||
| non-idempotent operations being spuriously reissued after receiving | ||||
| NFS4ERR_DELAY response. | ||||
| </t> | ||||
| <t> | ||||
| To address these issues, a destination server <bcp14>MAY</bcp14> do any of | ||||
| the following when implementing session transfer: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Avoid enforcing any sequencing semantics for a particular slot | ||||
| until the client has established the starting sequence for that | ||||
| slot on the destination server. | ||||
| </li> | ||||
| <li> | ||||
| For each slot, avoid | ||||
| returning a cached reply returning NFS4ERR_DELAY or NFS4ERR_MOVED | ||||
| until the client has established the starting sequence for that | ||||
| slot on the destination server. | ||||
| </li> | ||||
| <li> | ||||
| Until the client has established the starting sequence for a | ||||
| particular slot on the destination server, avoid reporting | ||||
| NFS4ERR_SEQ_MISORDERED or returning a cached reply that contains | ||||
| either NFS4ERR_DELAY or NFS4ERR_MOVED and consists solely of | ||||
| a series of operations where the response is NFS4_OK until the | ||||
| final error. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Because of the considerations mentioned above, including the rules | ||||
| for the handling of NFS4ERR_DELAY included in | ||||
| <xref target="err_DELAY" format="default"/>, the destination | ||||
| server can respond appropriately to SEQUENCE operations received | ||||
| from the client by adopting the three policies listed below: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Not responding with NFS4ERR_SEQ_MISORDERED for the initial | ||||
| request on a slot within a transferred session because the | ||||
| destination server cannot be aware of requests made by the | ||||
| client after the server handoff but before the client became | ||||
| aware of the shift. In cases in which NFS4ERR_SEQ_MISORDERED | ||||
| would normally have been reported, the request is to be processed | ||||
| normally as a new request. | ||||
| </li> | ||||
| <li> | ||||
| Replying as it would for a retry whenever the sequence matches | ||||
| that transferred by the source server, even though this would | ||||
| not provide retry handling for requests issued after the server | ||||
| handoff, under the assumption that, when such requests are issued, | ||||
| they will never be responded to in a state-changing fashion, | ||||
| making retry support for them unnecessary. | ||||
| </li> | ||||
| <li> | ||||
| Once a non-retry SEQUENCE is received for a given slot, using | ||||
| that as the basis for further sequence checking, with no further | ||||
| reference to the sequence value transferred by the source server. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="effecting_referrals" numbered="true" toc="default"> | ||||
| <name>Effecting File System Referrals</name> | ||||
| <t> | ||||
| Referrals are effected when an absent file system is encountered | ||||
| and one or more alternate locations are made available by the | ||||
| fs_locations or fs_locations_info attributes. The client will | ||||
| typically get an NFS4ERR_MOVED error, fetch the appropriate | ||||
| location information, and proceed to access the file system on | ||||
| a different server, even though it retains its logical position | ||||
| within the original namespace. Referrals differ from migration | ||||
| events in that they happen only when the client has not | ||||
| previously referenced the file system in question (so there | ||||
| is nothing to transition). Referrals can only come into | ||||
| effect when an absent file system is encountered at its | ||||
| root. | ||||
| </t> | ||||
| <t> | ||||
| The examples given in the sections below are somewhat artificial in | ||||
| that an actual client will not typically do a multi-component | ||||
| look up, but will have cached information regarding the upper levels | ||||
| of the name hierarchy. However, these examples are chosen to make | ||||
| the required behavior clear and easy to put within the scope of a | ||||
| small number of requests, without getting into a discussion of the details of | ||||
| how specific clients might choose to cache things. | ||||
| </t> | ||||
| <section anchor="referrals_lookup" numbered="true" toc="default"> | ||||
| <name>Referral Example (LOOKUP)</name> | ||||
| <t> | ||||
| Let us suppose that the following COMPOUND is sent in an | ||||
| environment in which /this/is/the/path is absent from the | ||||
| target server. This may be for a number of reasons. It may | ||||
| be that the file system has moved, or it may be that | ||||
| the target server is functioning mainly, or solely, to refer | ||||
| clients to the servers on which various file systems are located. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| PUTROOTFH | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "this" | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "is" | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "the" | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "path" | ||||
| </li> | ||||
| <li> | ||||
| GETFH | ||||
| </li> | ||||
| <li> | ||||
| GETATTR (fsid, fileid, size, time_modify) | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Under the given circumstances, the following will be the result. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| PUTROOTFH --> NFS_OK. The current fh is now the root of | ||||
| the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "this" --> NFS_OK. The current fh is for /this and is | ||||
| within the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "is" --> NFS_OK. The current fh is for /this/is | ||||
| and is within the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the | ||||
| and is within the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "path" --> NFS_OK. The current fh is for | ||||
| /this/is/the/path and is within a new, absent file system, but ... | ||||
| the client will never see the value of that fh. | ||||
| </li> | ||||
| <li> | ||||
| GETFH --> NFS4ERR_MOVED. | ||||
| Fails because current fh is in an absent file system at the start of | ||||
| the operation, and the specification makes no exception for GETFH. | ||||
| </li> | ||||
| <li> | ||||
| GETATTR (fsid, fileid, size, time_modify). | ||||
| Not executed because the failure of the GETFH stops processing | ||||
| of the COMPOUND. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Given the failure of the GETFH, the client has the job of | ||||
| determining the root of the absent file system and where to find | ||||
| that file system, i.e., the server and path relative to that | ||||
| server's root fh. Note that in this example, the client did | ||||
| not obtain filehandles and attribute information (e.g., fsid) for | ||||
| the intermediate directories, so that it would not be sure where | ||||
| the absent file system starts. It could be the case, for example, | ||||
| that /this/is/the is the root of the moved file system and that | ||||
| the reason that the look up of "path" succeeded is that the | ||||
| file system was not absent on that operation but was moved between the last | ||||
| LOOKUP and the GETFH (since COMPOUND is not atomic). Even if we | ||||
| had the fsids for all of the intermediate directories, we could | ||||
| have no way of knowing that /this/is/the/path was the root of a | ||||
| new file system, since we don't yet have its fsid. | ||||
| </t> | ||||
| <t> | ||||
| In order to get the necessary information, let us re-send the | ||||
| chain of LOOKUPs with GETFHs and GETATTRs to at least get the | ||||
| fsids so we can be sure where the appropriate file system boundaries are. | ||||
| The client could choose to get fs_locations_info | ||||
| at the same time but in | ||||
| most cases the client will have a good guess as to where file system | ||||
| boundaries are (because of where NFS4ERR_MOVED was, and was not, | ||||
| received) making fetching of fs_locations_info unnecessary. | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>OP01:</dt> | ||||
| <dd><t>PUTROOTFH --> NFS_OK</t> | ||||
| <ul><li>Current fh is root of pseudo-fs.</li></ul> | ||||
| </dd> | ||||
| <dt>OP02:</dt> | ||||
| <dd><t>GETATTR(fsid) --> NFS_OK</t> | ||||
| <ul><li>Just for completeness. Normally, clients will know the fsid | ||||
| of the pseudo-fs as soon as they establish communication with | ||||
| a server.</li></ul> | ||||
| </dd> | ||||
| <dt>OP03:</dt> | ||||
| <dd>LOOKUP "this" --> NFS_OK</dd> | ||||
| <dt>OP04:</dt> | ||||
| <dd><t>GETATTR(fsid) --> NFS_OK</t> | ||||
| <ul><li> | ||||
| Get current fsid to see where file system boundaries are. The fsid | ||||
| will be that for the pseudo-fs in this example, so no | ||||
| boundary.</li></ul> | ||||
| </dd> | ||||
| <dt>OP05:</dt> | ||||
| <dd><t>GETFH --> NFS_OK</t> | ||||
| <ul><li>Current fh is for /this and is within pseudo-fs.</li></ul> | ||||
| </dd> | ||||
| <dt>OP06:</dt> | ||||
| <dd><t>LOOKUP "is" --> NFS_OK</t> | ||||
| <ul><li>Current fh is for /this/is and is within pseudo-fs.</li></ul> | ||||
| </dd> | ||||
| <dt>OP07:</dt> | ||||
| <dd><t>GETATTR(fsid) --> NFS_OK</t> | ||||
| <ul><li> | ||||
| Get current fsid to see where file system boundaries are. The fsid | ||||
| will be that for the pseudo-fs in this example, so no | ||||
| boundary.</li></ul> | ||||
| </dd> | ||||
| <dt>OP08:</dt> | ||||
| <dd> | ||||
| <t>GETFH --> NFS_OK</t> | ||||
| <ul><li>Current fh is for /this/is and is within pseudo-fs.</li></ul> | ||||
| </dd> | ||||
| <dt>OP09:</dt> | ||||
| <dd> | ||||
| <t>LOOKUP "the" --> NFS_OK</t> | ||||
| <ul><li> | ||||
| Current fh is for /this/is/the and is within pseudo-fs.</li></ul> | ||||
| </dd> | ||||
| <dt>OP10:</dt> | ||||
| <dd> | ||||
| <t>GETATTR(fsid) --> NFS_OK</t> | ||||
| <ul><li> | ||||
| Get current fsid to see where file system boundaries are. The fsid | ||||
| will be that for the pseudo-fs in this example, so no | ||||
| boundary.</li></ul> | ||||
| </dd> | ||||
| <dt>OP11:</dt> | ||||
| <dd> | ||||
| <t>GETFH --> NFS_OK</t> | ||||
| <ul><li>Current fh is for /this/is/the and is within pseudo-fs.</li></ul> | ||||
| </dd> | ||||
| <dt>OP12:</dt> | ||||
| <dd> | ||||
| <t>LOOKUP "path" --> NFS_OK</t> | ||||
| <ul><li> | ||||
| Current fh is for /this/is/the/path and is within a new, | ||||
| absent file system, but ...</li> | ||||
| <li> | ||||
| The client will never see the value of that fh.</li></ul> | ||||
| </dd> | ||||
| <dt>OP13:</dt> | ||||
| <dd> | ||||
| <t>GETATTR(fsid, fs_locations_info) --> NFS_OK</t> | ||||
| <ul><li> | ||||
| We are getting the fsid to know where the file system boundaries are. | ||||
| In this operation, the fsid will be different than that of the | ||||
| parent directory (which in turn was retrieved in OP10). | ||||
| Note that the fsid we are given will not necessarily be preserved at the new | ||||
| location. That fsid might be different, and in fact the fsid | ||||
| we have for this file system might be a valid fsid of a different | ||||
| file system on that new server.</li> | ||||
| <li> | ||||
| In this particular case, we are pretty sure anyway that what | ||||
| has moved is /this/is/the/path rather than /this/is/the | ||||
| since we have the fsid of the latter and it is that of the | ||||
| pseudo-fs, which presumably cannot move. However, in other | ||||
| examples, we might not have this kind of information to rely | ||||
| on (e.g., /this/is/the might be a non-pseudo file system | ||||
| separate from /this/is/the/path), so we need to have | ||||
| other reliable source information on the boundary of the file system | ||||
| that is moved. If, for example, the file system /this/is | ||||
| had moved, we would have a case of migration rather than | ||||
| referral, and once the boundaries of the migrated file system | ||||
| was clear we could fetch fs_locations_info.</li> | ||||
| <li> | ||||
| We are fetching fs_locations_info because the fact that we got an | ||||
| NFS4ERR_MOVED at this point means that it is most likely that | ||||
| this is a referral and we need the destination. Even if it is | ||||
| the case that /this/is/the is a file system that has | ||||
| migrated, we will still need the location information for that | ||||
| file system.</li></ul></dd> | ||||
| <dt>OP14:</dt> | ||||
| <dd> | ||||
| <t>GETFH --> NFS4ERR_MOVED</t> | ||||
| <ul><li> | ||||
| Fails because current fh is in an absent file system at the start of | ||||
| the operation, and the specification makes no exception for GETFH. Note | ||||
| that this means the server will never send the client a | ||||
| filehandle from within an absent file system.</li></ul> | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| Given the above, the client knows where the root of the absent file | ||||
| system is (/this/is/the/path) by noting where the change of | ||||
| fsid occurred (between "the" and "path"). The | ||||
| fs_locations_info attribute also gives the client the | ||||
| actual location of | ||||
| the absent file system, so that the referral can proceed. The | ||||
| server gives the client the bare minimum of information about the | ||||
| absent file system so that there will be very little scope for | ||||
| problems of conflict between information sent by the referring | ||||
| server and information of the file system's home. No filehandles | ||||
| and very few attributes are present on the referring server, and the | ||||
| client can treat those it receives as transient | ||||
| information with the function of enabling the referral. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="referrals_readdir" numbered="true" toc="default"> | ||||
| <name>Referral Example (READDIR)</name> | ||||
| <t> | ||||
| Another context in which a client may encounter referrals is when | ||||
| it does a READDIR on a directory in which some of the sub-directories | ||||
| are the roots of absent file systems. | ||||
| </t> | ||||
| <t> | ||||
| Suppose such a directory is read as follows: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| PUTROOTFH | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "this" | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "is" | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "the" | ||||
| </li> | ||||
| <li> | ||||
| READDIR (fsid, size, time_modify, mounted_on_fileid) | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| In this case, because rdattr_error is not requested, | ||||
| fs_locations_info | ||||
| is not requested, and some of the attributes cannot be provided, the | ||||
| result will be an NFS4ERR_MOVED error on the READDIR, with the | ||||
| detailed results as follows: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| PUTROOTFH --> NFS_OK. The current fh is at the root of the | ||||
| pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "this" --> NFS_OK. The current fh is for /this and is | ||||
| within the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "is" --> NFS_OK. The current fh is for /this/is | ||||
| and is within the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the | ||||
| and is within the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| READDIR (fsid, size, time_modify, mounted_on_fileid) --> | ||||
| NFS4ERR_MOVED. Note that the same error would have been | ||||
| returned if /this/is/the had migrated, but it is returned because the | ||||
| directory contains the root of an absent file system. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| So now suppose that we re-send with rdattr_error: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| PUTROOTFH | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "this" | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "is" | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "the" | ||||
| </li> | ||||
| <li> | ||||
| READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid) | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The results will be: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| PUTROOTFH --> NFS_OK. The current fh is at the root of the | ||||
| pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "this" --> NFS_OK. The current fh is for /this and is | ||||
| within the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "is" --> NFS_OK. The current fh is for /this/is | ||||
| and is within the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the | ||||
| and is within the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid) | ||||
| --> NFS_OK. The attributes for directory entry with the | ||||
| component named "path" will only contain | ||||
| rdattr_error | ||||
| with the value NFS4ERR_MOVED, together with an fsid | ||||
| value and a value for mounted_on_fileid. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Suppose we do another READDIR to get fs_locations_info (although | ||||
| we could have used a GETATTR directly, as in | ||||
| <xref target="referrals_lookup" format="default"/>). | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| PUTROOTFH | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "this" | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "is" | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "the" | ||||
| </li> | ||||
| <li> | ||||
| READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, | ||||
| size, time_modify) | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The results would be: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| PUTROOTFH --> NFS_OK. The current fh is at the root of the | ||||
| pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "this" --> NFS_OK. The current fh is for /this and is | ||||
| within the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "is" --> NFS_OK. The current fh is for /this/is | ||||
| and is within the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the | ||||
| and is within the pseudo-fs. | ||||
| </li> | ||||
| <li> | ||||
| READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, | ||||
| size, time_modify) --> NFS_OK. The attributes will be as shown below. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The attributes for the directory entry with the | ||||
| component named "path" will only contain: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| rdattr_error (value: NFS_OK) | ||||
| </li> | ||||
| <li> | ||||
| fs_locations_info | ||||
| </li> | ||||
| <li> | ||||
| mounted_on_fileid (value: unique fileid within referring file system) | ||||
| </li> | ||||
| <li> | ||||
| fsid (value: unique value within referring server) | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The attributes for entry "path" will not contain size or | ||||
| time_modify because these attributes are not available within an | ||||
| absent file system. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="fs_locations" numbered="true" toc="default"> | ||||
| <name>The Attribute fs_locations</name> | ||||
| <t> | ||||
| The fs_locations attribute is structured in the following way: | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct fs_location4 { | ||||
| utf8str_cis server<>; | ||||
| pathname4 rootpath; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct fs_locations4 { | ||||
| pathname4 fs_root; | ||||
| fs_location4 locations<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The fs_location4 data type is used to represent the location of a | ||||
| file system by providing a server name and the path to the root | ||||
| of the file system within that server's namespace. | ||||
| When a set of servers have corresponding file systems at the | ||||
| same path within their namespaces, an array of server names may | ||||
| be provided. An | ||||
| entry in the server array is a UTF-8 string and represents one | ||||
| of a | ||||
| traditional DNS host name, IPv4 address, IPv6 address, or a | ||||
| zero-length string. | ||||
| An IPv4 or IPv6 address is represented as a universal | ||||
| address (see <xref target="netaddr4" format="default"/> and <xref target="RFC5665" format="default"/>), minus the netid, and either with | ||||
| or without the trailing ".p1.p2" suffix that | ||||
| represents the port number. If the suffix is omitted, | ||||
| then the default port, 2049, <bcp14>SHOULD</bcp14> be assumed. | ||||
| A zero-length string <bcp14>SHOULD</bcp14> be used to indicate the current address | ||||
| being used for the RPC call. It is not | ||||
| a requirement that all servers that share the same rootpath | ||||
| be listed | ||||
| in one fs_location4 instance. The array of server names is provided for | ||||
| convenience. Servers that share the same rootpath may also be listed | ||||
| in separate fs_location4 entries in the fs_locations attribute. | ||||
| </t> | ||||
| <t> | ||||
| The fs_locations4 data type and the fs_locations attribute each | ||||
| contain an array of | ||||
| such locations. Since the namespace of each server may be | ||||
| constructed differently, the "fs_root" field is provided. The | ||||
| path represented | ||||
| by fs_root represents the location of the file system in the | ||||
| current server's namespace, i.e., that of the | ||||
| server from which the fs_locations attribute was obtained. The | ||||
| fs_root path is meant to aid the client by clearly referencing | ||||
| the root of the file system whose locations are being reported, | ||||
| no matter what object within the current file system the | ||||
| current filehandle designates. The fs_root is simply the | ||||
| pathname the client used to reach the object on the current server | ||||
| (i.e., the object to which the fs_locations attribute applies). | ||||
| </t> | ||||
| <t> | ||||
| When the fs_locations attribute | ||||
| is interrogated and there are no alternate file system locations, | ||||
| the server <bcp14>SHOULD</bcp14> return a zero-length array of fs_location4 | ||||
| structures, together with a valid fs_root. | ||||
| </t> | ||||
| <t> | ||||
| As an example, suppose there is a replicated file system located | ||||
| at two | ||||
| servers (servA and servB). At servA, the file system is located at | ||||
| path /a/b/c. At, servB the file system is located at path /x/y/z. | ||||
| If the client were to obtain the fs_locations value for the | ||||
| directory at /a/b/c/d, it might not necessarily know | ||||
| that the file system's root is located in servA's namespace | ||||
| at /a/b/c. When the client switches to servB, it will need | ||||
| to determine that the directory it first referenced at servA is now | ||||
| represented by the path /x/y/z/d on servB. To facilitate this, the | ||||
| fs_locations attribute provided by servA would have an fs_root value | ||||
| of /a/b/c and two entries in fs_locations. One entry in fs_locations | ||||
| will be for itself (servA) and the other will be for servB with a | ||||
| path of /x/y/z. With this information, the client is able to | ||||
| substitute /x/y/z for the /a/b/c at the beginning of its access | ||||
| path and construct /x/y/z/d to use for the new server. | ||||
| </t> | ||||
| <t> | ||||
| Note that there is no requirement that the number | ||||
| of components in each rootpath be the same; there | ||||
| is no relation between the number of components in | ||||
| rootpath or fs_root, and none of the components | ||||
| in a rootpath and fs_root have to be the same. In | ||||
| the above example, we could have had a third element | ||||
| in the locations array, with server equal to "servC" | ||||
| and rootpath equal to "/I/II", and a fourth element in | ||||
| locations with server equal to "servD" and rootpath | ||||
| equal to "/aleph/beth/gimel/daleth/he". | ||||
| </t> | ||||
| <t> | ||||
| The relationship between fs_root to a rootpath is | ||||
| that the client replaces the pathname indicated in | ||||
| fs_root for the current server for the substitute | ||||
| indicated in rootpath for the new server. | ||||
| </t> | ||||
| <t> | ||||
| For an example of a referred or migrated file | ||||
| system, suppose there is a file system located | ||||
| at serv1. At serv1, the file system is located at | ||||
| /az/buky/vedi/glagoli. The client finds that object | ||||
| at glagoli has migrated (or is a referral). The | ||||
| client gets the fs_locations attribute, which contains | ||||
| an fs_root of /az/buky/vedi/glagoli, and one element | ||||
| in the locations array, with server equal to serv2, | ||||
| and rootpath equal to /izhitsa/fita. The client | ||||
| replaces /az/buky/vedi/glagoli with /izhitsa/fita, | ||||
| and uses the latter pathname on serv2. | ||||
| </t> | ||||
| <t> | ||||
| Thus, the server <bcp14>MUST</bcp14> return an fs_root that is equal | ||||
| to the path the client used to reach the object to which the | ||||
| fs_locations attribute applies. Otherwise, the | ||||
| client cannot determine the new path to use on the new server. | ||||
| </t> | ||||
| <t> | ||||
| Since the fs_locations attribute lacks information defining various | ||||
| attributes of the various file system choices presented, it <bcp14>SHOULD</bcp14> | ||||
| only be interrogated and used when fs_locations_info is not available. | ||||
| When fs_locations is used, information about the | ||||
| specific locations should be assumed based on the following rules. | ||||
| </t> | ||||
| <t> | ||||
| The following rules are general and apply irrespective of the | ||||
| context. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| All listed | ||||
| file system instances should be considered as of the | ||||
| same handle class, if and only if, the | ||||
| current fh_expire_type attribute does not include the | ||||
| FH4_VOL_MIGRATION | ||||
| bit. Note that in the case of referral, filehandle issues do | ||||
| not apply since there can be no filehandles known within the | ||||
| current file system, nor is there any access to the fh_expire_type | ||||
| attribute on the referring (absent) file system. | ||||
| </li> | ||||
| <li> | ||||
| All listed file system instances should be considered as of the | ||||
| same fileid class if and only if the | ||||
| fh_expire_type attribute indicates persistent filehandles and | ||||
| does not include the FH4_VOL_MIGRATION | ||||
| bit. Note that in the case of referral, fileid issues do | ||||
| not apply since there can be no fileids known within the | ||||
| referring (absent) file system, nor is there any access to | ||||
| the fh_expire_type attribute. | ||||
| </li> | ||||
| <li> | ||||
| All file system instances | ||||
| servers should be considered as of different | ||||
| change classes. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| For other class assignments, handling of file system | ||||
| transitions depends on the reasons for the transition: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| When the transition is due to migration, that is, the client was | ||||
| directed to a new file system after receiving an NFS4ERR_MOVED error, | ||||
| the target should be | ||||
| treated as being of the same | ||||
| write-verifier class as the source. | ||||
| </li> | ||||
| <li> | ||||
| When the transition is due to failover to another replica, | ||||
| that is, the client selected another replica without | ||||
| receiving an NFS4ERR_MOVED error, the target should be | ||||
| treated as being of a different | ||||
| write-verifier class from the source. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The specific choices reflect typical implementation patterns for | ||||
| failover and controlled migration, respectively. Since other | ||||
| choices are possible and useful, this information is better | ||||
| obtained by using fs_locations_info. When a server implementation | ||||
| needs to communicate other choices, it <bcp14>MUST</bcp14> support the | ||||
| fs_locations_info attribute. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="SECCON" format="default"/> for a | ||||
| discussion on the recommendations for the security | ||||
| flavor to be used by any GETATTR operation that | ||||
| requests the fs_locations attribute. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-li-new" numbered="true" toc="default"> | ||||
| <name>The Attribute fs_locations_info</name> | ||||
| <t> | ||||
| The fs_locations_info attribute is intended as a more functional | ||||
| replacement for the fs_locations attribute, which will continue to exist | ||||
| and be supported. Clients can use it to get a more complete set of | ||||
| data about alternative file system locations, including additional | ||||
| network paths to access replicas in use and additional replicas. | ||||
| When the server does not support | ||||
| fs_locations_info, fs_locations can be used to get a subset of the | ||||
| data. A server that supports fs_locations_info <bcp14>MUST</bcp14> support | ||||
| fs_locations as well. | ||||
| </t> | ||||
| <t> | ||||
| There is additional data present in | ||||
| fs_locations_info that is not available in fs_locations: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Attribute continuity information. This information | ||||
| will allow a client to select a | ||||
| replica that meets the transparency requirements of the | ||||
| applications accessing the data and to leverage | ||||
| optimizations due to the server guarantees of attribute | ||||
| continuity (e.g., if the | ||||
| change attribute of a file of the file system is continuous | ||||
| between multiple replicas, | ||||
| the client does not have to invalidate the file's cache | ||||
| when switching to a different replica). | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| File system identity information that indicates when multiple | ||||
| replicas, from the client's point of view, correspond to the | ||||
| same target file system, allowing them to be used | ||||
| interchangeably, without disruption, as distinct synchronized | ||||
| replicas of the same file data. | ||||
| </t> | ||||
| <t> | ||||
| Note that having two replicas with common identity information is | ||||
| distinct from the case of two (trunked) paths to the same | ||||
| replica. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| Information that will bear on the suitability of various | ||||
| replicas, depending on the use that the client intends. For | ||||
| example, many applications need an absolutely up-to-date copy | ||||
| (e.g., those that write), while others may only need access to | ||||
| the most up-to-date copy reasonably available. | ||||
| </li> | ||||
| <li> | ||||
| Server-derived preference information for replicas, which can | ||||
| be used to implement load-balancing while giving the client | ||||
| the entire file system list to be used in case the primary fails. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The fs_locations_info attribute is structured similarly to the | ||||
| fs_locations attribute. A top-level structure | ||||
| (fs_locations_info4) contains the entire attribute including the root | ||||
| pathname of the file system and an array of lower-level structures that | ||||
| define replicas that share a common rootpath on their respective | ||||
| servers. The lower-level structure in turn | ||||
| (fs_locations_item4) contains a specific pathname and information on one | ||||
| or more individual network access paths. For that last, lowest level, | ||||
| fs_locations_info has an fs_locations_server4 | ||||
| structure that contains per-server-replica information in addition | ||||
| to the file system | ||||
| location entry. This per-server-replica information includes a | ||||
| nominally opaque array, fls_info, within which specific pieces | ||||
| of information are located at the specific indices listed below. | ||||
| </t> | ||||
| <t> | ||||
| Two fs_location_server4 entries that are within different | ||||
| fs_location_item4 structures are never trunkable, while two entries | ||||
| within in the same fs_location_item4 structure might or might not be | ||||
| trunkable. Two entries that are trunkable will have identical | ||||
| identity information, although, as noted above, the converse is | ||||
| not the case. | ||||
| </t> | ||||
| <t> | ||||
| The attribute will always contain at least a single fs_locations_server | ||||
| entry. Typically, there will be an entry with the FS4LIGF_CUR_REQ | ||||
| flag set, although in the case of a referral there will be no | ||||
| entry with that flag set. | ||||
| </t> | ||||
| <t> | ||||
| It should be noted that fs_locations_info attributes returned by | ||||
| servers for various replicas may differ for various reasons. | ||||
| One server may know about a set of replicas that are not known to | ||||
| other servers. Further, compatibility attributes may differ. | ||||
| Filehandles might be of the same class going from replica A to | ||||
| replica B but not going in the reverse direction. This might happen | ||||
| because the filehandles are the same, but | ||||
| replica B's server implementation might not have provision to note | ||||
| and report that equivalence. | ||||
| </t> | ||||
| <t> | ||||
| The fs_locations_info attribute consists of a root | ||||
| pathname (fli_fs_root, just like fs_root in the | ||||
| fs_locations attribute), together with an array of | ||||
| fs_location_item4 structures. The fs_location_item4 | ||||
| structures in turn consist of a root pathname | ||||
| (fli_rootpath) together with an array (fli_entries) | ||||
| of elements of data type fs_locations_server4, | ||||
| all defined as follows. | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* | ||||
| * Defines an individual server access path | ||||
| */ | ||||
| struct fs_locations_server4 { | ||||
| int32_t fls_currency; | ||||
| opaque fls_info<>; | ||||
| utf8str_cis fls_server; | ||||
| }; | ||||
| /* | ||||
| * Byte indices of items within | ||||
| * fls_info: flag fields, class numbers, | ||||
| * bytes indicating ranks and orders. | ||||
| */ | ||||
| const FSLI4BX_GFLAGS = 0; | ||||
| const FSLI4BX_TFLAGS = 1; | ||||
| const FSLI4BX_CLSIMUL = 2; | ||||
| const FSLI4BX_CLHANDLE = 3; | ||||
| const FSLI4BX_CLFILEID = 4; | ||||
| const FSLI4BX_CLWRITEVER = 5; | ||||
| const FSLI4BX_CLCHANGE = 6; | ||||
| const FSLI4BX_CLREADDIR = 7; | ||||
| const FSLI4BX_READRANK = 8; | ||||
| const FSLI4BX_WRITERANK = 9; | ||||
| const FSLI4BX_READORDER = 10; | ||||
| const FSLI4BX_WRITEORDER = 11; | ||||
| /* | ||||
| * Bits defined within the general flag byte. | ||||
| */ | ||||
| const FSLI4GF_WRITABLE = 0x01; | ||||
| const FSLI4GF_CUR_REQ = 0x02; | ||||
| const FSLI4GF_ABSENT = 0x04; | ||||
| const FSLI4GF_GOING = 0x08; | ||||
| const FSLI4GF_SPLIT = 0x10; | ||||
| /* | ||||
| * Bits defined within the transport flag byte. | ||||
| */ | ||||
| const FSLI4TF_RDMA = 0x01; | ||||
| /* | ||||
| * Defines a set of replicas sharing | ||||
| * a common value of the rootpath | ||||
| * within the corresponding | ||||
| * single-server namespaces. | ||||
| */ | ||||
| struct fs_locations_item4 { | ||||
| fs_locations_server4 fli_entries<>; | ||||
| pathname4 fli_rootpath; | ||||
| }; | ||||
| /* | ||||
| * Defines the overall structure of | ||||
| * the fs_locations_info attribute. | ||||
| */ | ||||
| struct fs_locations_info4 { | ||||
| uint32_t fli_flags; | ||||
| int32_t fli_valid_for; | ||||
| pathname4 fli_fs_root; | ||||
| fs_locations_item4 fli_items<>; | ||||
| }; | ||||
| /* | ||||
| * Flag bits in fli_flags. | ||||
| */ | ||||
| const FSLI4IF_VAR_SUB = 0x00000001; | ||||
| typedef fs_locations_info4 fattr4_fs_locations_info; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| As noted above, the fs_locations_info attribute, when supported, may | ||||
| be requested of absent file systems without causing NFS4ERR_MOVED to | ||||
| be returned. It is generally expected that it will be available for | ||||
| both present and absent file systems even if only a single | ||||
| fs_locations_server4 entry is present, designating the current (present) | ||||
| file system, or two fs_locations_server4 entries designating the | ||||
| previous location of an absent file system (the one just referenced) and its | ||||
| successor location. Servers are strongly urged to support this | ||||
| attribute on all file systems if they support it on any file system. | ||||
| </t> | ||||
| <t> | ||||
| The data presented in the fs_locations_info attribute may be obtained | ||||
| by the server in any number of ways, including specification by | ||||
| the administrator or by current protocols for transferring data | ||||
| among replicas and protocols not yet developed. NFSv4.1 only defines | ||||
| how this information is presented by the server to | ||||
| the client. | ||||
| </t> | ||||
| <section anchor="SEC11-fsli-server" numbered="true" toc="default"> | ||||
| <name>The fs_locations_server4 Structure</name> | ||||
| <t> | ||||
| The fs_locations_server4 structure consists of the following items | ||||
| in addition to the fls_server field, which specifies a network | ||||
| address or set of addresses to be used to access the specified file | ||||
| system. Note that both of these items (i.e., fls_currency and | ||||
| fls_info) | ||||
| specify attributes of the | ||||
| file system replica and should not be different when there are | ||||
| multiple fs_locations_server4 structures, each | ||||
| specifying a network path to the chosen replica, for the same | ||||
| replica. | ||||
| </t> | ||||
| <t> | ||||
| When these values are different in two fs_locations_server4 structures, | ||||
| a client has no basis for choosing one over the other and is best off | ||||
| simply ignoring both entries, whether these entries apply to migration | ||||
| replication or referral. When there are more than two such entries, | ||||
| majority voting can be used to exclude a single erroneous entry from | ||||
| consideration. In the case in which trunking information is provided | ||||
| for a replica currently being accessed, the additional trunked addresses | ||||
| can be ignored while access continues on the address currently being | ||||
| used, even if the entry corresponding to that path might be considered | ||||
| invalid. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| An indication of how up-to-date the file system is (fls_currency) in | ||||
| seconds. This value | ||||
| is relative to the master copy. A negative | ||||
| value indicates that the server is unable to give any | ||||
| reasonably useful value here. A value of zero indicates that the | ||||
| file system is the actual writable data or a reliably coherent | ||||
| and fully up-to-date copy. Positive values indicate how | ||||
| out-of-date this copy can normally be before it is considered for | ||||
| update. Such a value is not a guarantee that such updates | ||||
| will always be performed on the required schedule but instead | ||||
| serves as a hint about how far the copy of the data would be | ||||
| expected to be behind the most up-to-date copy. | ||||
| </li> | ||||
| <li> | ||||
| A counted array of one-byte values (fls_info) containing | ||||
| information about the particular file system instance. This | ||||
| data includes general flags, transport capability flags, | ||||
| file system equivalence class information, and selection | ||||
| priority information. The encoding will be discussed below. | ||||
| </li> | ||||
| <li> | ||||
| The server string (fls_server). For the case of the | ||||
| replica currently | ||||
| being accessed (via GETATTR), a zero-length string <bcp14>MAY</bcp14> be used to | ||||
| indicate the current address being used for the RPC call. | ||||
| The fls_server field can also be an IPv4 or IPv6 address, | ||||
| formatted the same way as an IPv4 or IPv6 address in the "server" | ||||
| field of the fs_location4 data type (see | ||||
| <xref target="fs_locations" format="default"/>). | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| With the exception of the transport-flag field (at offset | ||||
| FSLI4BX_TFLAGS with the fls_info array), all of this data defined | ||||
| in this specification applies to the replica specified by the entry, | ||||
| rather than the specific network path used to access it. | ||||
| The classification of data in extensions to this data is discussed below. | ||||
| </t> | ||||
| <t> | ||||
| Data within the fls_info array is in the form of 8-bit data items | ||||
| with constants giving the offsets within the array of various | ||||
| values describing this particular file system instance. | ||||
| This style of | ||||
| definition was chosen, in preference to explicit XDR | ||||
| structure definitions for these values, for a number of | ||||
| reasons. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The kinds of data in the fls_info array, representing flags, | ||||
| file system classes, and priorities among sets of file systems | ||||
| representing the same data, are such that 8 bits provide | ||||
| a quite acceptable range of values. Even where there might | ||||
| be more than 256 such file system instances, having more than | ||||
| 256 distinct classes or priorities is unlikely. | ||||
| </li> | ||||
| <li> | ||||
| Explicit definition of the various specific data items within | ||||
| XDR would limit expandability in that any extension within | ||||
| would require yet another attribute, | ||||
| leading to specification and implementation clumsiness. | ||||
| In the context of the NFSv4 extension model in effect at the time | ||||
| fs_locations_info was designed (i.e., that which is described in | ||||
| RFC 5661 <xref target="RFC5661" format="default"/>), this would | ||||
| necessitate a new minor version | ||||
| to effect any Standards Track extension to the data in fls_info. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The set of fls_info data is subject to expansion in a future minor | ||||
| version or in a Standards Track RFC within the context of a single | ||||
| minor version. The server <bcp14>SHOULD NOT</bcp14> send and the | ||||
| client <bcp14>MUST NOT</bcp14> use indices within the fls_info array | ||||
| or flag bits that are not defined in Standards Track RFCs. | ||||
| </t> | ||||
| <t> | ||||
| In light of the new extension model defined in RFC 8178 | ||||
| <xref target="RFC8178" format="default"/> | ||||
| and the fact that the individual items within fls_info are not | ||||
| explicitly referenced in the XDR, the following practices should be | ||||
| followed when extending or otherwise changing the structure of | ||||
| the data returned in fls_info within the scope of a single minor | ||||
| version: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| All extensions need to be described by Standards Track documents. | ||||
| There is no need for such documents to be marked as updating | ||||
| RFC 5661 <xref target="RFC5661" format="default"/> or this document. | ||||
| </li> | ||||
| <li> | ||||
| It needs to be made clear whether the information in any added data | ||||
| items applies to the replica specified by the entry or to the specific | ||||
| network paths specified in the entry. | ||||
| </li> | ||||
| <li> | ||||
| There needs to be a reliable way defined to determine whether the | ||||
| server is aware of the extension. This may be based on the | ||||
| length field of the fls_info array, but it is more flexible to | ||||
| provide fs-scope or server-scope attributes to indicate what | ||||
| extensions are provided. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| This encoding scheme can be adapted to the specification of | ||||
| multi-byte numeric values, even though none are currently | ||||
| defined. If extensions are made via Standards Track RFCs, | ||||
| multi-byte quantities will be encoded as a range of bytes | ||||
| with a range of indices, with the byte interpreted in big-endian | ||||
| byte order. Further, any such index assignments will be constrained | ||||
| by the need for the relevant quantities not to | ||||
| cross XDR word boundaries. | ||||
| </t> | ||||
| <t> | ||||
| The fls_info array currently contains: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Two 8-bit flag fields, one devoted to general file-system | ||||
| characteristics and a second reserved for transport-related | ||||
| capabilities. | ||||
| </li> | ||||
| <li> | ||||
| Six 8-bit class values that define various file system | ||||
| equivalence classes as explained below. | ||||
| </li> | ||||
| <li> | ||||
| Four 8-bit priority values that govern file system selection | ||||
| as explained below. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The general file system characteristics flag (at byte index | ||||
| FSLI4BX_GFLAGS) has the following | ||||
| bits defined within it: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| FSLI4GF_WRITABLE indicates that this file system target is writable, | ||||
| allowing it to be selected by clients that may need to write | ||||
| on this file system. When the current file system instance | ||||
| is writable and is defined as of the same simultaneous use | ||||
| class (as specified by the value at index FSLI4BX_CLSIMUL) | ||||
| to which the client was previously writing, then it must | ||||
| incorporate within its data any committed | ||||
| write made on the source file system instance. See | ||||
| <xref target="SEC11-EFF-wv" format="default"/>, which discusses | ||||
| the write-verifier class. While there is no harm in not setting | ||||
| this flag for a file system that turns out to be writable, | ||||
| turning the flag on for a read-only file system can cause | ||||
| problems for clients that select a migration or replication | ||||
| target based on the flag and then find themselves unable to write. | ||||
| </li> | ||||
| <li> | ||||
| FSLI4GF_CUR_REQ indicates that this replica is the one on which | ||||
| the request is being made. Only a single server entry may | ||||
| have this flag set and, in the case of a referral, no entry | ||||
| will have it set. Note that this flag might be set even if the | ||||
| request was made on a network access path different from any of | ||||
| those specified in the current entry. | ||||
| </li> | ||||
| <li> | ||||
| FSLI4GF_ABSENT indicates that this entry corresponds to an absent | ||||
| file system replica. It can only be set if FSLI4GF_CUR_REQ is set. | ||||
| When both such bits are set, it indicates that a file system | ||||
| instance is not usable but that the information in the entry | ||||
| can be used to determine the sorts of continuity available | ||||
| when switching from this replica to other possible replicas. | ||||
| Since this bit can only be true if FSLI4GF_CUR_REQ is true, the | ||||
| value could be determined using the fs_status attribute, but | ||||
| the information is also made available here for the | ||||
| convenience of the client. An entry with this bit, since it | ||||
| represents a true file system (albeit absent), does not appear | ||||
| in the event of a referral, but only when a file system has | ||||
| been accessed at this location and has subsequently been migrated. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| FSLI4GF_GOING indicates that a replica, while still available, | ||||
| should not be used further. The client, if using it, should | ||||
| make an orderly transfer to another file system instance as | ||||
| expeditiously as possible. It is expected that file systems | ||||
| going out of service will be announced as FSLI4GF_GOING some time | ||||
| before the actual loss of service. It is also expected that the | ||||
| fli_valid_for value | ||||
| will be sufficiently small to allow clients to detect and act | ||||
| on scheduled events, while large enough that the cost of the | ||||
| requests to fetch the fs_locations_info values will not be | ||||
| excessive. Values on the order of ten minutes seem | ||||
| reasonable. | ||||
| </t> | ||||
| <t> | ||||
| When this flag is seen as part of a transition into a new | ||||
| file system, a client might choose to transfer immediately | ||||
| to another replica, or it may reference the current file system | ||||
| and only transition when a migration event occurs. Similarly, | ||||
| when this flag appears as a replica in the referral, clients | ||||
| would likely avoid being referred to this instance whenever | ||||
| there is another choice. | ||||
| </t> | ||||
| <t> | ||||
| This flag, like the other items within fls_info, applies to the | ||||
| replica rather than to a particular path to that replica. When | ||||
| it appears, a transition to a new replica, rather than to a | ||||
| different path to the same replica, is indicated. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| FSLI4GF_SPLIT indicates that when a transition occurs from | ||||
| the current file system instance to this one, the replacement | ||||
| may consist of multiple file systems. In this case, the | ||||
| client has to be prepared for the possibility that objects | ||||
| on the same file system before migration will be on different ones | ||||
| after. Note that FSLI4GF_SPLIT is not incompatible with the | ||||
| file systems belonging to the same fileid | ||||
| class | ||||
| since, if one has a set of fileids that are unique within | ||||
| a file system, each subset assigned to a smaller file system after migration | ||||
| would not have any conflicts internal to that file system. | ||||
| </t> | ||||
| <t> | ||||
| A client, in the case of a split file system, will interrogate | ||||
| existing files with which it has continuing connection (it | ||||
| is free to simply forget cached filehandles). If the client | ||||
| remembers the directory filehandle associated with each open | ||||
| file, it may proceed upward using LOOKUPP to find the new file system | ||||
| boundaries. Note that in the event of a referral, there will | ||||
| not be any such files and so these actions will not be performed. | ||||
| Instead, a reference to a portion of the original | ||||
| file system now split off into other file systems | ||||
| will encounter an fsid change and possibly a | ||||
| further referral. | ||||
| </t> | ||||
| <t> | ||||
| Once the client recognizes that one file system has been split | ||||
| into two, it can prevent the disruption of running applications | ||||
| by presenting the two file systems as a single | ||||
| one until a convenient point to recognize the transition, | ||||
| such as a restart. This would require a mapping | ||||
| from the server's fsids to fsids as seen by the client, but | ||||
| this is already necessary for other reasons. As noted | ||||
| above, existing fileids within the two descendant file systems | ||||
| will not conflict. Providing non-conflicting fileids for | ||||
| newly created files on the split file systems | ||||
| is the responsibility of the server (or servers working in | ||||
| concert). The server can encode filehandles such | ||||
| that filehandles generated before the split event can be discerned | ||||
| from those generated after the split, | ||||
| allowing the server to determine when the need | ||||
| for emulating two file systems as one is over. | ||||
| </t> | ||||
| <t> | ||||
| Although it is possible for this flag to be present in the | ||||
| event of referral, it would generally be of little interest | ||||
| to the client, since the client is not expected to have | ||||
| information regarding the current contents of the absent | ||||
| file system. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The transport-flag field (at byte index FSLI4BX_TFLAGS) contains | ||||
| the following bits related to the transport | ||||
| capabilities of the specific network path(s) specified by the | ||||
| entry: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| FSLI4TF_RDMA indicates that any specified network paths | ||||
| provide NFSv4.1 clients | ||||
| access using an RDMA-capable transport. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Attribute continuity and file system identity information are | ||||
| expressed by defining equivalence relations on the sets of | ||||
| file systems presented to the client. Each such relation | ||||
| is expressed as a set of file system equivalence classes. | ||||
| For each relation, a file system has an 8-bit class number. | ||||
| Two file systems belong to the same class if both have | ||||
| identical non-zero class numbers. Zero is treated as | ||||
| non-matching. Most often, | ||||
| the relevant question for the client will be whether a | ||||
| given replica is identical to / continuous with the current one in a | ||||
| given respect, but the information should be available also as to | ||||
| whether two other replicas match in that respect as well. | ||||
| </t> | ||||
| <t> | ||||
| The following fields specify the file system's class numbers | ||||
| for the equivalence relations used in determining the nature of | ||||
| file system transitions. See Sections | ||||
| <xref target="SEC11-trans-oview" format="counter"/> | ||||
| through <xref target="SEC11-trans-server" format="counter"/> | ||||
| and their various subsections | ||||
| for details about how | ||||
| this information is to be used. Servers may assign these values | ||||
| as they wish, so long as file system instances that share the | ||||
| same value have the specified relationship to one another; | ||||
| conversely, file systems that have the specified relationship | ||||
| to one another share a common class value. As each instance | ||||
| entry is added, the relationships of this instance to previously | ||||
| entered instances can be consulted, and if one is found that | ||||
| bears the specified relationship, that entry's class value can | ||||
| be copied to the new entry. When no such previous entry exists, | ||||
| a new value for that byte index (not previously used) can be | ||||
| selected, most likely by incrementing the value of the last class | ||||
| value assigned for that index. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The field with byte index FSLI4BX_CLSIMUL defines the | ||||
| simultaneous-use class for the file system. | ||||
| </li> | ||||
| <li> | ||||
| The field with byte index FSLI4BX_CLHANDLE defines the handle | ||||
| class for the file system. | ||||
| </li> | ||||
| <li> | ||||
| The field with byte index FSLI4BX_CLFILEID defines the fileid | ||||
| class for the file system. | ||||
| </li> | ||||
| <li> | ||||
| The field with byte index FSLI4BX_CLWRITEVER defines the | ||||
| write-verifier class for the file system. | ||||
| </li> | ||||
| <li> | ||||
| The field with byte index FSLI4BX_CLCHANGE defines the change | ||||
| class for the file system. | ||||
| </li> | ||||
| <li> | ||||
| The field with byte index FSLI4BX_CLREADDIR defines the readdir | ||||
| class for the file system. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Server-specified preference information is also provided via | ||||
| 8-bit values within the fls_info array. The values provide a | ||||
| rank and an order (see below) to be used with separate values | ||||
| specifiable for the cases of read-only and writable file | ||||
| systems. | ||||
| These values are compared | ||||
| for different file systems to establish the server-specified | ||||
| preference, with lower values indicating "more preferred". | ||||
| </t> | ||||
| <t> | ||||
| Rank is used to express a strict server-imposed ordering on | ||||
| clients, with lower values indicating "more preferred". Clients | ||||
| should attempt to use all replicas with a given rank before they | ||||
| use one with a higher rank. Only if all of those file systems are | ||||
| unavailable should the client proceed to those of a higher rank. | ||||
| Because specifying a rank will override client preferences, servers | ||||
| should be conservative about using this mechanism, particularly | ||||
| when the environment is one in which client communication characteristics | ||||
| are neither tightly controlled nor visible to the server. | ||||
| </t> | ||||
| <t> | ||||
| Within a rank, the order value is used to specify the server's | ||||
| preference to guide the client's selection when the client's own | ||||
| preferences are not controlling, with lower values of order | ||||
| indicating "more preferred". If replicas are approximately equal | ||||
| in all respects, clients should defer to the order specified by the | ||||
| server. When clients look at server latency as part of their | ||||
| selection, they are free to use this criterion, but it is suggested | ||||
| that when latency differences are not significant, the | ||||
| server-specified order should guide selection. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The field at byte index FSLI4BX_READRANK gives the rank value to | ||||
| be used for read-only access. | ||||
| </li> | ||||
| <li> | ||||
| The field at byte index FSLI4BX_READORDER gives the order value to | ||||
| be used for read-only access. | ||||
| </li> | ||||
| <li> | ||||
| The field at byte index FSLI4BX_WRITERANK gives the rank value to | ||||
| be used for writable access. | ||||
| </li> | ||||
| <li> | ||||
| The field at byte index FSLI4BX_WRITEORDER gives the order value to | ||||
| be used for writable access. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Depending on the potential need for write access by a given client, | ||||
| one of the pairs of rank and order values is used. | ||||
| The read rank and order should only be used | ||||
| if the client knows that only reading will ever be done or if it is | ||||
| prepared to switch to a different replica in the event that any | ||||
| write access capability is required in the future. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-fsli-info" numbered="true" toc="default"> | ||||
| <name>The fs_locations_info4 Structure</name> | ||||
| <t> | ||||
| The fs_locations_info4 structure, encoding the fs_locations_info | ||||
| attribute, contains the following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The fli_flags field, which contains general flags that affect | ||||
| the interpretation of this fs_locations_info4 structure and | ||||
| all fs_locations_item4 structures within it. The only flag | ||||
| currently defined is FSLI4IF_VAR_SUB. All bits in the | ||||
| fli_flags field that are not defined should always be returned as zero. | ||||
| </li> | ||||
| <li> | ||||
| The fli_fs_root field, which contains the pathname of the root of | ||||
| the current file system on the current server, just as it does | ||||
| in the fs_locations4 structure. | ||||
| </li> | ||||
| <li> | ||||
| An array called fli_items of fs_locations4_item structures, which contain | ||||
| information about replicas of the current file system. Where | ||||
| the current file system is actually present, or has been | ||||
| present, i.e., this is not a referral situation, one of the | ||||
| fs_locations_item4 structures will contain an fs_locations_server4 for | ||||
| the current server. This structure will have FSLI4GF_ABSENT set | ||||
| if the current file system is absent, i.e., normal access to it | ||||
| will return NFS4ERR_MOVED. | ||||
| </li> | ||||
| <li> | ||||
| The fli_valid_for field specifies a time in seconds | ||||
| for which it is reasonable for a client to use the fs_locations_info attribute | ||||
| without refetch. The fli_valid_for value does not provide a | ||||
| guarantee of validity since servers can unexpectedly go out of | ||||
| service or become inaccessible for any number of reasons. | ||||
| Clients are well-advised to refetch this information for an | ||||
| actively accessed file system at every fli_valid_for seconds. This | ||||
| is particularly important when file system replicas may go out | ||||
| of service in a controlled way using the FSLI4GF_GOING flag to | ||||
| communicate an ongoing change. The server should set | ||||
| fli_valid_for to a value that allows well-behaved clients to | ||||
| notice the FSLI4GF_GOING flag and make an orderly switch before | ||||
| the loss of service becomes effective. If this value is zero, | ||||
| then no refetch interval is appropriate and the client need | ||||
| not refetch this data on any particular schedule. | ||||
| In the event of a transition to a new file system instance, a | ||||
| new value of the fs_locations_info attribute will be fetched at | ||||
| the destination. It is to be expected that this may have a | ||||
| different fli_valid_for value, which the client should then use | ||||
| in the same fashion as the previous value. Because a refetch | ||||
| of the attribute causes information from all component entries to | ||||
| be refetched, the server will typically provide a low value for | ||||
| this field if any of the replicas are likely to go out of service | ||||
| in a short time frame. Note that, because of the ability of the | ||||
| server to return NFS4ERR_MOVED to trigger the use of different paths, | ||||
| when alternate trunked paths are available, there is generally no | ||||
| need to use low values of fli_valid_for in connection with the | ||||
| management of alternate paths to the same replica. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The FSLI4IF_VAR_SUB flag within fli_flags controls whether variable | ||||
| substitution is to be enabled. See <xref target="SEC11-fsli-item" format="default"/> | ||||
| for an explanation of variable substitution. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="SEC11-fsli-item" numbered="true" toc="default"> | ||||
| <name>The fs_locations_item4 Structure</name> | ||||
| <t> | ||||
| The fs_locations_item4 structure contains a pathname | ||||
| (in the field fli_rootpath) that encodes | ||||
| the path of the target file system replicas on the set of | ||||
| servers designated by the included fs_locations_server4 entries. | ||||
| The precise manner in which this target location | ||||
| is specified depends on the value of the FSLI4IF_VAR_SUB | ||||
| flag within the associated fs_locations_info4 structure. | ||||
| </t> | ||||
| <t> | ||||
| If this flag is not set, then fli_rootpath simply designates | ||||
| the location of the target file system within each server's | ||||
| single-server namespace just as it does for the rootpath | ||||
| within the fs_location4 structure. When this bit is set, | ||||
| however, component entries of a certain form are subject | ||||
| to client-specific variable substitution so as to allow | ||||
| a degree of namespace non-uniformity in order to accommodate | ||||
| the selection of client-specific file system targets to | ||||
| adapt to different client architectures or other | ||||
| characteristics. | ||||
| </t> | ||||
| <t> | ||||
| When such substitution is in effect, a variable beginning | ||||
| with the string "${" and ending with the string "}" | ||||
| and containing a colon is to be | ||||
| replaced by the client-specific value associated with | ||||
| that variable. The string "unknown" should be used | ||||
| by the client when it has no value for such a variable. | ||||
| The pathname resulting from such | ||||
| substitutions is used to designate the target file system, | ||||
| so that different clients may have different file systems, | ||||
| corresponding to that location in the multi-server namespace. | ||||
| </t> | ||||
| <t> | ||||
| As mentioned above, such substituted pathname variables | ||||
| contain a colon. The part before the colon is to be a | ||||
| DNS domain name, and the part after is to be a case-insensitive | ||||
| alphanumeric string. | ||||
| </t> | ||||
| <t> | ||||
| Where the domain is "ietf.org", only variable names defined | ||||
| in this document or subsequent Standards Track RFCs | ||||
| are subject to such substitution. Organizations are | ||||
| free to use their domain names to create their own sets | ||||
| of client-specific variables, to be subject to such | ||||
| substitution. In cases where such variables are intended | ||||
| to be used more broadly than a single organization, | ||||
| publication of an Informational RFC defining such variables | ||||
| is <bcp14>RECOMMENDED</bcp14>. | ||||
| </t> | ||||
| <t> | ||||
| The variable ${ietf.org:CPU_ARCH} is used to denote that the | ||||
| CPU architecture object files are compiled. This specification | ||||
| does not limit the acceptable values (except that they must be | ||||
| valid UTF-8 strings), but such values as "x86", "x86_64", and "sparc" | ||||
| would be expected to be used in line with industry practice. | ||||
| </t> | ||||
| <t> | ||||
| The variable ${ietf.org:OS_TYPE} is used to denote the | ||||
| operating system, and thus the kernel and library APIs, | ||||
| for which code might be compiled. This specification does | ||||
| not limit the acceptable values (except that they must be | ||||
| valid UTF-8 strings), but such values as "linux" and "freebsd" | ||||
| would be expected to be used in line with industry practice. | ||||
| </t> | ||||
| <t> | ||||
| The variable ${ietf.org:OS_VERSION} is used to denote the | ||||
| operating system version, and thus the specific details | ||||
| of versioned interfaces, | ||||
| for which code might be compiled. This specification does | ||||
| not limit the acceptable values (except that they must be | ||||
| valid UTF-8 strings). However, combinations of numbers and | ||||
| letters with interspersed dots would be expected to be used | ||||
| in line with industry practice, with the details of the | ||||
| version format depending on the specific value of | ||||
| the variable ${ietf.org:OS_TYPE} with which | ||||
| it is used. | ||||
| </t> | ||||
| <t> | ||||
| Use of these variables could result in the direction of different | ||||
| clients to different file systems on the same server, as | ||||
| appropriate to particular clients. In cases in which the | ||||
| target file systems are located on different servers, a single | ||||
| server could serve as a referral point so that each valid | ||||
| combination of variable values would designate a referral | ||||
| hosted on a single server, with the targets of those referrals on | ||||
| a number of different servers. | ||||
| </t> | ||||
| <t> | ||||
| Because namespace administration is affected by the values | ||||
| selected to substitute for various variables, clients should | ||||
| provide convenient means of determining what variable | ||||
| substitutions a client will implement, as well as, where | ||||
| appropriate, providing means to control the substitutions to | ||||
| be used. The exact means by which this will be done is | ||||
| outside the scope of this specification. | ||||
| </t> | ||||
| <t> | ||||
| Although variable substitution is most suitable for use | ||||
| in the context of referrals, it may be used in the context | ||||
| of replication and migration. If it is used in these contexts, | ||||
| the server must ensure that no matter what values the | ||||
| client presents for the substituted variables, the result | ||||
| is always a valid successor file system instance to that | ||||
| from which a transition is occurring, i.e., that the data is | ||||
| identical or represents a later image of a writable file | ||||
| system. | ||||
| </t> | ||||
| <t> | ||||
| Note that when fli_rootpath is a null pathname (that is, one | ||||
| with zero components), the file system designated is at the | ||||
| root of the specified server, whether or not the FSLI4IF_VAR_SUB | ||||
| flag within the associated fs_locations_info4 structure is | ||||
| set. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="fs_status" numbered="true" toc="default"> | ||||
| <name>The Attribute fs_status</name> | ||||
| <t> | ||||
| In an environment in which multiple copies of the same basic set of | ||||
| data are available, information regarding the particular source of | ||||
| such data and the relationships among different copies can be very | ||||
| helpful in providing consistent data to applications. | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| enum fs4_status_type { | ||||
| STATUS4_FIXED = 1, | ||||
| STATUS4_UPDATED = 2, | ||||
| STATUS4_VERSIONED = 3, | ||||
| STATUS4_WRITABLE = 4, | ||||
| STATUS4_REFERRAL = 5 | ||||
| }; | ||||
| struct fs4_status { | ||||
| bool fss_absent; | ||||
| fs4_status_type fss_type; | ||||
| utf8str_cs fss_source; | ||||
| utf8str_cs fss_current; | ||||
| int32_t fss_age; | ||||
| nfstime4 fss_version; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The boolean fss_absent indicates whether the file system is | ||||
| currently absent. This value will be set if the file system was | ||||
| previously present and becomes absent, or if the file system has | ||||
| never been present and the type is STATUS4_REFERRAL. When this | ||||
| boolean is set and the type is not STATUS4_REFERRAL, the | ||||
| remaining information in the fs4_status reflects that last valid | ||||
| when the file system was present. | ||||
| </t> | ||||
| <t> | ||||
| The fss_type field indicates the kind of file system image represented. | ||||
| This is of particular importance when using the version values to | ||||
| determine appropriate succession of file system images. | ||||
| When fss_absent is set, and the file system was previously | ||||
| present, the value of fss_type reflected is that when the file was last present. | ||||
| Five values are distinguished: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| STATUS4_FIXED, which indicates a read-only image in the sense | ||||
| that it will never change. The possibility is allowed that, as | ||||
| a result of migration or switch to a different image, changed | ||||
| data can be accessed, but within the confines of this instance, | ||||
| no change is allowed. The client can use this fact to | ||||
| cache aggressively. | ||||
| </li> | ||||
| <li> | ||||
| STATUS4_VERSIONED, which indicates that the image, like the | ||||
| STATUS4_UPDATED case, is updated externally, but it provides | ||||
| a guarantee that the server will carefully update an | ||||
| associated version value so that the client can | ||||
| protect itself from a situation in which it reads | ||||
| data from one version of the file system and then later reads | ||||
| data from an earlier version of the same file system. See | ||||
| below for a discussion of how this can be done. | ||||
| </li> | ||||
| <li> | ||||
| STATUS4_UPDATED, which indicates an image that cannot be | ||||
| updated by the user writing to it but that may be changed | ||||
| externally, typically because it is a periodically updated | ||||
| copy of another writable file system somewhere else. In | ||||
| this case, version information is not provided, and the | ||||
| client does not have the responsibility of making sure | ||||
| that this version only advances upon a file system instance | ||||
| transition. In this case, it is the responsibility of the | ||||
| server to make sure that the data presented after a file | ||||
| system instance transition is a proper successor image and | ||||
| includes all changes seen by the client and any change made | ||||
| before all such changes. | ||||
| </li> | ||||
| <li> | ||||
| STATUS4_WRITABLE, which indicates that the file system is an | ||||
| actual writable one. The client need not, of course, actually | ||||
| write to the file system, but once it does, it should not | ||||
| accept a transition to anything other than a writable instance | ||||
| of that same file system. | ||||
| </li> | ||||
| <li> | ||||
| STATUS4_REFERRAL, which indicates that the file system in | ||||
| question is absent and has never been present on this | ||||
| server. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Note that in the STATUS4_UPDATED and STATUS4_VERSIONED cases, the | ||||
| server is responsible for the appropriate handling of locks that | ||||
| are inconsistent with external changes to delegations. | ||||
| If a server gives out delegations, they <bcp14>SHOULD</bcp14> be recalled | ||||
| before an inconsistent change is made to the data, and <bcp14>MUST</bcp14> | ||||
| be revoked if this is not possible. Similarly, if an OPEN is | ||||
| inconsistent with data that is changed (the OPEN has | ||||
| OPEN4_SHARE_DENY_WRITE/OPEN4_SHARE_DENY_BOTH | ||||
| and the data is changed), that OPEN <bcp14>SHOULD</bcp14> be considered | ||||
| administratively revoked. | ||||
| </t> | ||||
| <t> | ||||
| The opaque strings fss_source and fss_current provide a way of presenting | ||||
| information about the source of the file system image being present. | ||||
| It is not intended that the client do anything with this information | ||||
| other than make it available to administrative tools. It is | ||||
| intended that this information be helpful when researching possible | ||||
| problems with a file system image that might arise when it is | ||||
| unclear if the correct image is being accessed and, if not, how that | ||||
| image came to be made. This kind of diagnostic information will be | ||||
| helpful, if, as seems likely, copies of file systems are made in | ||||
| many different ways (e.g., simple user-level copies, | ||||
| file-system-level point-in-time copies, | ||||
| clones of the underlying storage), | ||||
| under a variety of administrative arrangements. In such | ||||
| environments, determining how a given set of data was constructed | ||||
| can be very helpful in resolving problems. | ||||
| </t> | ||||
| <t> | ||||
| The opaque string fss_source is used to indicate the source of a | ||||
| given file system with the expectation that tools capable of | ||||
| creating a file system image propagate this information, when | ||||
| possible. It is understood that this may not always be possible | ||||
| since a user-level copy may be thought of as creating a new data | ||||
| set and the tools used may have no mechanism to propagate this | ||||
| data. When a file system is initially created, it is desirable | ||||
| to associate with it | ||||
| data regarding how the file system was created, where it was | ||||
| created, who created it, etc. Making this information available | ||||
| in this attribute in a human-readable | ||||
| string will be helpful for applications and | ||||
| system administrators and will also serve to make it available when | ||||
| the original file system is used to make subsequent copies. | ||||
| </t> | ||||
| <t> | ||||
| The opaque string fss_current should provide whatever information is | ||||
| available about the source of the current copy. Such | ||||
| information includes | ||||
| the tool creating it, any relevant parameters to that tool, the | ||||
| time at which the copy was done, the user making the change, the | ||||
| server on which the change was made, etc. All information should be | ||||
| in a human-readable string. | ||||
| </t> | ||||
| <t> | ||||
| The field fss_age provides an indication of how out-of-date the file system | ||||
| currently is with respect to its ultimate data source (in case of | ||||
| cascading data updates). This complements the fls_currency field of | ||||
| fs_locations_server4 (see <xref target="SEC11-li-new" format="default"/>) in the | ||||
| following way: the information in fls_currency | ||||
| gives a bound for how out of date the data in a file system might | ||||
| typically get, while the value in fss_age gives a bound on how out-of-date that | ||||
| data actually is. Negative values imply that no information is | ||||
| available. A zero means that this data is known to be current. | ||||
| A positive value means that this data is known to be no older than | ||||
| that number of seconds with respect to the ultimate data source. | ||||
| Using this value, the client may be able to decide that a data copy | ||||
| is too old, so that it may search for a newer version to use. | ||||
| </t> | ||||
| <t> | ||||
| The fss_version field provides a version identification, in the form of | ||||
| a time value, such that successive versions always have later time | ||||
| values. When the fs_type is anything other than | ||||
| STATUS4_VERSIONED, the server may provide such a value, but there is | ||||
| no guarantee as to its validity and clients will not use it except | ||||
| to provide additional information to add to fss_source and fss_current. | ||||
| </t> | ||||
| <t> | ||||
| When fss_type is STATUS4_VERSIONED, servers <bcp14>SHOULD</bcp14> provide a value | ||||
| of fss_version that progresses monotonically whenever any new version | ||||
| of the data is established. This allows the client, if reliable | ||||
| image progression is important to it, to fetch this attribute as | ||||
| part of each COMPOUND where data or metadata from the file system is | ||||
| used. | ||||
| </t> | ||||
| <t> | ||||
| When it is important to the client to make sure that only valid | ||||
| successor images are accepted, it must make sure that it does not | ||||
| read data or metadata from the file system without updating its | ||||
| sense of the current state of the image. This is to avoid the possibility | ||||
| that the fs_status that the client holds will be one for an | ||||
| earlier image, which would cause the client to accept a new file | ||||
| system instance that is later than that but still earlier than | ||||
| the updated data read by the client. | ||||
| </t> | ||||
| <t> | ||||
| In order to accept valid images reliably, the client must do a GETATTR of the fs_status | ||||
| attribute that follows any interrogation of data or metadata within the | ||||
| file system in question. Often this is most conveniently done by | ||||
| appending such a GETATTR after all other operations that reference | ||||
| a given file system. When errors occur between reading file system | ||||
| data and performing such a GETATTR, care must be exercised to make | ||||
| sure that the data in question is not used before obtaining the | ||||
| proper fs_status value. In this connection, when an OPEN is done | ||||
| within such a versioned file system and the associated GETATTR of | ||||
| fs_status is not successfully completed, the open file in question | ||||
| must not be accessed until that fs_status is fetched. | ||||
| </t> | ||||
| <t> | ||||
| The procedure above will ensure that before using any data from the | ||||
| file system the client has in hand a newly-fetched current version | ||||
| of the file system image. Multiple values for multiple requests in | ||||
| flight can be resolved by assembling them into the required partial | ||||
| order (and the elements should form a total order within the | ||||
| partial order) and | ||||
| using the last. | ||||
| The client may then, when switching among | ||||
| file system instances, decline to use an instance that does not have | ||||
| an fss_type of STATUS4_VERSIONED or whose fss_version field is earlier than the | ||||
| last one obtained from the predecessor file system instance. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="pnfs" numbered="true" toc="default"> | ||||
| <name>Parallel NFS (pNFS)</name> | ||||
| <section anchor="pnfs_intro" numbered="true" toc="default"> | ||||
| <name>Introduction</name> | ||||
| <t> | ||||
| pNFS is an <bcp14>OPTIONAL</bcp14> feature within NFSv4.1; the pNFS feature | ||||
| set allows direct client access to the storage devices containing | ||||
| file data. When file data for a single NFSv4 server is stored on | ||||
| multiple and/or higher-throughput storage devices (by comparison to | ||||
| the server's throughput capability), the result can be significantly | ||||
| better file access performance. The relationship among multiple | ||||
| clients, a single server, and multiple storage devices for pNFS | ||||
| (server and clients have access to all storage devices) is shown in | ||||
| <xref target="fig_system" format="default"/>. | ||||
| </t> | ||||
| <figure anchor="fig_system"> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| +-----------+ | ||||
| |+-----------+ +-----------+ | ||||
| ||+-----------+ | | | ||||
| ||| | NFSv4.1 + pNFS | | | ||||
| +|| Clients |<------------------------------>| Server | | ||||
| +| | | | | ||||
| +-----------+ | | | ||||
| ||| +-----------+ | ||||
| ||| | | ||||
| ||| | | ||||
| ||| Storage +-----------+ | | ||||
| ||| Protocol |+-----------+ | | ||||
| ||+----------------||+-----------+ Control | | ||||
| |+-----------------||| | Protocol| | ||||
| +------------------+|| Storage |------------+ | ||||
| +| Devices | | ||||
| +-----------+ | ||||
| ]]></artwork> | ||||
| </figure> | ||||
| <t> | ||||
| In this model, the clients, server, and storage devices are | ||||
| responsible for managing file access. This is in contrast to NFSv4 | ||||
| without pNFS, where it is primarily the server's responsibility; some | ||||
| of this responsibility may be delegated to the client under strictly | ||||
| specified conditions. See <xref target="storage_protocol" format="default"/> | ||||
| for a discussion of the Storage Protocol. See <xref target="control_protocol" format="default"/> for a | ||||
| discussion of the Control Protocol. | ||||
| </t> | ||||
| <t> | ||||
| pNFS takes the form of <bcp14>OPTIONAL</bcp14> operations that manage protocol | ||||
| objects called 'layouts' (<xref target="layout_types" format="default"/>) that | ||||
| contain a byte-range and storage location information. The layout | ||||
| is managed in a similar fashion | ||||
| as NFSv4.1 data delegations. For example, the layout is leased, | ||||
| recallable, and revocable. However, layouts are distinct abstractions | ||||
| and are manipulated with new operations. When a client holds a | ||||
| layout, it is granted the ability to directly access the byte-range | ||||
| at the storage location specified in the layout. | ||||
| </t> | ||||
| <t> | ||||
| There are interactions between layouts and other NFSv4.1 | ||||
| abstractions such as data delegations and byte-range locking. | ||||
| Delegation issues are discussed in <xref target="recalling_layout" format="default"/>. Byte-range locking issues are | ||||
| discussed in Sections <xref target="layout_iomode" format="counter"/> and <xref target="layout_semantics" format="counter"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>pNFS Definitions</name> | ||||
| <t> | ||||
| NFSv4.1's pNFS feature provides parallel data access to a | ||||
| file system that stripes its content across multiple | ||||
| storage servers. The first instantiation of pNFS, as | ||||
| part of NFSv4.1, separates the file system protocol | ||||
| processing into two parts: metadata processing and data | ||||
| processing. Data consist of the contents of regular | ||||
| files that are striped across storage servers. Data | ||||
| striping occurs in at least two ways: on a file-by-file | ||||
| basis and, within sufficiently large files, on a | ||||
| block-by-block basis. In contrast, striped access to | ||||
| metadata by pNFS clients is not provided in NFSv4.1, even | ||||
| though the file system back end of a pNFS server might | ||||
| stripe metadata. Metadata consist of everything else, | ||||
| including the contents of non-regular files (e.g., | ||||
| directories); see <xref target="metadata" format="default"/>. The | ||||
| metadata functionality is implemented by an NFSv4.1 | ||||
| server that supports pNFS and the operations described in | ||||
| <xref target="nfsv41operations" format="default"/>; such a server is | ||||
| called a metadata server (<xref target="mds" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| The data functionality is implemented by one or more storage devices, each of which | ||||
| are accessed by the client via a storage protocol. A subset (defined in <xref target="ds_ops" format="default"/>) of NFSv4.1 is one such storage protocol. New terms are | ||||
| introduced to the NFSv4.1 nomenclature and existing terms are | ||||
| clarified to allow for the description of the pNFS feature. | ||||
| </t> | ||||
| <section anchor="metadata" numbered="true" toc="default"> | ||||
| <name>Metadata</name> | ||||
| <t> | ||||
| Information about a file system object, such as its name, location | ||||
| within the namespace, owner, ACL, and other attributes. Metadata may | ||||
| also include storage location information, and this will vary based | ||||
| on the underlying storage mechanism that is used. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="mds" numbered="true" toc="default"> | ||||
| <name>Metadata Server</name> | ||||
| <t> | ||||
| An NFSv4.1 server that supports the pNFS feature. A variety of | ||||
| architectural choices exist for the metadata server and its use of | ||||
| file system information held at the server. Some servers may | ||||
| contain metadata only for file objects residing at the | ||||
| metadata server, while the file data resides on associated storage | ||||
| devices. Other metadata servers may hold both metadata and a | ||||
| varying degree of file data. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>pNFS Client</name> | ||||
| <t> | ||||
| An NFSv4.1 client that supports pNFS operations and supports at | ||||
| least one storage protocol for performing I/O | ||||
| to storage devices. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Storage Device</name> | ||||
| <t> | ||||
| A storage device stores a regular file's data, but leaves metadata | ||||
| management to the metadata server. A storage device could be | ||||
| another NFSv4.1 server, an object-based storage device (OSD), | ||||
| a block | ||||
| device accessed over a System Area Network (SAN, e.g., either | ||||
| FiberChannel or iSCSI SAN), or some other entity. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="storage_protocol" numbered="true" toc="default"> | ||||
| <name>Storage Protocol</name> | ||||
| <t> | ||||
| As noted in <xref target="fig_system" format="default"/>, | ||||
| the storage protocol is the method used by the client to | ||||
| store and retrieve data directly from the storage devices. | ||||
| </t> | ||||
| <t> | ||||
| The NFSv4.1 pNFS feature has been structured to allow for a variety | ||||
| of storage protocols to be defined and used. | ||||
| One example storage protocol is NFSv4.1 itself (as documented in | ||||
| <xref target="file_layout_type" format="default"/>). Other options for the storage protocol | ||||
| are described elsewhere and include: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Block/volume protocols such as Internet SCSI (iSCSI) | ||||
| <xref target="RFC3720" format="default"/> and FCP <xref target="FCP-2" format="default"/>. The block/volume | ||||
| protocol support can be independent of the addressing structure | ||||
| of the block/volume protocol used, allowing more than one | ||||
| protocol to access the same file data and enabling extensibility | ||||
| to other block/volume protocols. See | ||||
| <xref target="RFC5663" format="default"/> for a layout | ||||
| specification that | ||||
| allows pNFS to use block/volume storage protocols. | ||||
| </li> | ||||
| <li> | ||||
| Object protocols such as OSD over iSCSI or Fibre Channel <xref target="OSD-T10" format="default"/>. See | ||||
| <xref target="RFC5664" format="default"/> for a layout specification | ||||
| that allows pNFS to use object storage protocols. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| It is possible that various storage protocols are available to | ||||
| both client and server and it may be possible that a client and | ||||
| server do not have a matching storage protocol available to them. | ||||
| Because of this, the pNFS server <bcp14>MUST</bcp14> support normal NFSv4.1 access | ||||
| to any file accessible by the pNFS feature; this will allow for | ||||
| continued interoperability between an NFSv4.1 client and server. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="control_protocol" numbered="true" toc="default"> | ||||
| <name>Control Protocol</name> | ||||
| <t> | ||||
| As noted in <xref target="fig_system" format="default"/>, | ||||
| the control protocol is used by the exported file system between the | ||||
| metadata server and storage devices. Specification of such | ||||
| protocols is outside the scope of the NFSv4.1 protocol. Such | ||||
| control protocols would be used to control activities such as the | ||||
| allocation and deallocation of storage, the management of state | ||||
| required by the storage devices to perform client access control, | ||||
| and, depending on the storage protocol, the enforcement of | ||||
| authentication and authorization so that restrictions that | ||||
| would be enforced by the metadata server are also enforced by | ||||
| the storage device. | ||||
| </t> | ||||
| <t> | ||||
| A particular control protocol is not <bcp14>REQUIRED</bcp14> by NFSv4.1 but | ||||
| requirements are placed on the control protocol for maintaining | ||||
| attributes like modify time, the change attribute, and the end-of-file | ||||
| (EOF) position. Note that if pNFS is layered over a clustered, parallel | ||||
| file system (e.g., <xref target="PVFS" format="default">PVFS</xref>), the mechanisms that | ||||
| enable clustering and parallelism in that file system can be considered | ||||
| the control protocol. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="layout_types" numbered="true" toc="default"> | ||||
| <name>Layout Types</name> | ||||
| <t> | ||||
| A layout describes the mapping of a file's data to the storage | ||||
| devices that hold the data. A layout is said to belong to a | ||||
| specific layout type (data type layouttype4, see <xref target="layouttype4" format="default"/>). The layout type allows for variants to | ||||
| handle different storage protocols, such as those associated with | ||||
| block/volume <xref target="RFC5663" format="default"/>, object <xref target="RFC5664" format="default"/>, and file (<xref target="file_layout_type" format="default"/>) layout types. A metadata server, along with its control | ||||
| protocol, <bcp14>MUST</bcp14> support at least one layout type. A private | ||||
| sub-range of the layout type namespace is also defined. Values from | ||||
| the private layout type range <bcp14>MAY</bcp14> be used for internal testing or | ||||
| experimentation (see <xref target="layouttype4" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| As an example, the organization of the file layout type could be | ||||
| an array of tuples (e.g., device ID, filehandle), along with a | ||||
| definition of how the data is | ||||
| stored across the devices (e.g., striping). A block/volume layout | ||||
| might be an array of tuples that store <device ID, block number, | ||||
| block count> | ||||
| along with information about block size and the | ||||
| associated file offset of the block number. An object layout might | ||||
| be an array of tuples <device ID, object ID> and an additional | ||||
| structure (i.e., the aggregation map) that defines how the logical | ||||
| byte sequence of the file data is serialized into the different | ||||
| objects. Note that the actual layouts are typically more complex | ||||
| than these simple expository examples. | ||||
| </t> | ||||
| <t> | ||||
| Requests for pNFS-related operations will often specify a layout | ||||
| type. Examples of such operations are GETDEVICEINFO and LAYOUTGET. | ||||
| The response for these operations will include structures such | ||||
| as a device_addr4 or a layout4, each of which includes a layout type within | ||||
| it. The layout type sent by the server <bcp14>MUST</bcp14> always be the same | ||||
| one requested by the client. When a server sends a response that | ||||
| includes a different layout type, the client <bcp14>SHOULD</bcp14> ignore the | ||||
| response and behave as if the server had returned an error response. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="layout" numbered="true" toc="default"> | ||||
| <name>Layout</name> | ||||
| <t> | ||||
| A layout defines how a file's data is organized on one or more | ||||
| storage devices. There are many potential layout types; each of the | ||||
| layout types are differentiated by the storage protocol used to | ||||
| access data and by the aggregation scheme that lays out the file | ||||
| data on the underlying storage devices. A layout is precisely | ||||
| identified by the tuple <client ID, filehandle, layout | ||||
| type, iomode, range>, where filehandle refers to the filehandle | ||||
| of the file on the metadata server. | ||||
| </t> | ||||
| <t> | ||||
| It is important to define when layouts overlap and/or conflict with | ||||
| each other. For two layouts with overlapping byte-ranges to | ||||
| actually overlap each other, both layouts must be of the same layout | ||||
| type, correspond to the same filehandle, and have the same iomode. | ||||
| Layouts conflict when they overlap and differ in the content of the | ||||
| layout (i.e., the storage device/file mapping parameters differ). | ||||
| Note that differing iomodes do not lead to conflicting layouts. It | ||||
| is permissible for layouts with different iomodes, pertaining to the | ||||
| same byte-range, to be held by the same client. An example of this | ||||
| would be copy-on-write functionality for a block/volume layout type. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="layout_iomode" numbered="true" toc="default"> | ||||
| <name>Layout Iomode</name> | ||||
| <t> | ||||
| The layout iomode (data type layoutiomode4, see <xref target="layoutiomode4" format="default"/>) indicates to the metadata server the | ||||
| client's intent to perform either just READ operations | ||||
| or a mixture containing READ | ||||
| and WRITE operations. For certain layout | ||||
| types, it is useful for a client to specify this intent at the time it sends LAYOUTGET | ||||
| (<xref target="OP_LAYOUTGET" format="default"/>). For example, for | ||||
| block/volume-based protocols, block allocation could occur when a | ||||
| LAYOUTIOMODE4_RW iomode is specified. A special LAYOUTIOMODE4_ANY iomode is defined | ||||
| and can only be used for LAYOUTRETURN and CB_LAYOUTRECALL, not for | ||||
| LAYOUTGET. It specifies that layouts pertaining to both LAYOUTIOMODE4_READ and | ||||
| LAYOUTIOMODE4_RW iomodes are being returned or recalled, respectively. | ||||
| </t> | ||||
| <t> | ||||
| A storage device may validate I/O with regard to the iomode; this | ||||
| is dependent upon storage device implementation and layout type. | ||||
| Thus, if the client's layout iomode is inconsistent with the I/O | ||||
| being performed, the storage device may reject the client's I/O with | ||||
| an error indicating that a new layout with the correct iomode should be | ||||
| obtained via LAYOUTGET. For example, if a client gets a layout with a LAYOUTIOMODE4_READ iomode and | ||||
| performs a WRITE to a storage device, the storage device is allowed | ||||
| to reject that WRITE. | ||||
| </t> | ||||
| <t> | ||||
| The use of the layout iomode does not conflict with OPEN share modes or byte-range LOCK operations; | ||||
| open share mode and byte-range lock conflicts are enforced as they are without the | ||||
| use of pNFS and are logically separate from the pNFS layout level. | ||||
| Open share modes and byte-range locks are the preferred method for | ||||
| restricting user access to data files. For example, an OPEN of | ||||
| OPEN4_SHARE_ACCESS_WRITE does not conflict with a LAYOUTGET containing an iomode | ||||
| of LAYOUTIOMODE4_RW performed by another client. Applications that depend | ||||
| on writing into the same file concurrently may use byte-range locking to | ||||
| serialize their accesses. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="device_ids" numbered="true" toc="default"> | ||||
| <name>Device IDs</name> | ||||
| <t> | ||||
| The device ID (data type deviceid4, see | ||||
| <xref target="deviceid4" format="default"/>) identifies a group of storage devices. The scope | ||||
| of a device ID is the pair <client ID, layout type>. In practice, a | ||||
| significant amount of information may be required to fully address | ||||
| a storage device. Rather than embedding all such information in a | ||||
| layout, layouts embed device IDs. The NFSv4.1 operation | ||||
| GETDEVICEINFO (<xref target="OP_GETDEVICEINFO" format="default"/>) is used to | ||||
| retrieve the complete address information (including | ||||
| all device addresses for the device ID) regarding the storage | ||||
| device according to its layout type and device ID. For example, | ||||
| the address of an NFSv4.1 data server or of an object-based storage | ||||
| device could be an IP address and port. The address of a block | ||||
| storage device could be a volume label. | ||||
| </t> | ||||
| <t> | ||||
| Clients cannot expect the mapping between a device ID and | ||||
| its storage device address(es) to persist across metadata server restart. | ||||
| See <xref target="mds_recovery" format="default"/> for a description of how | ||||
| recovery works in that situation. | ||||
| </t> | ||||
| <t> | ||||
| A device ID lives as long as there is a layout | ||||
| referring to the device ID. If there are no layouts | ||||
| referring to the device ID, the server is free to | ||||
| delete the device ID any time. | ||||
| Once a device ID is deleted by the server, the server <bcp14>MUST NOT</bcp14> | ||||
| reuse the device ID for the same layout type and client ID again. | ||||
| This requirement is feasible because the device ID is 16 bytes | ||||
| long, leaving sufficient room to store a generation number if the | ||||
| server's implementation requires most of the rest of the device ID's | ||||
| content to be reused. This requirement is necessary because | ||||
| otherwise the race conditions between asynchronous notification | ||||
| of device ID addition and deletion would be too difficult to | ||||
| sort out. | ||||
| </t> | ||||
| <t> | ||||
| Device ID to device address mappings are not leased, | ||||
| and can be changed at any time. (Note that while | ||||
| device ID to device address mappings are likely | ||||
| to change after the metadata server restarts, the | ||||
| server is not required to change the mappings.) | ||||
| A server has two | ||||
| choices for changing mappings. It can recall all | ||||
| layouts referring to the device ID or it can use a | ||||
| notification mechanism. | ||||
| </t> | ||||
| <t> | ||||
| The NFSv4.1 protocol has no optimal way to recall | ||||
| all layouts that referred to a particular device ID | ||||
| (unless the server associates a single device ID with | ||||
| a single fsid or a single client ID; in which case, | ||||
| CB_LAYOUTRECALL has options for recalling all layouts | ||||
| associated with the fsid, client ID pair, or just the | ||||
| client ID). | ||||
| </t> | ||||
| <t> | ||||
| Via a notification mechanism | ||||
| (see <xref target="OP_CB_NOTIFY_DEVICEID" format="default"/>), | ||||
| device ID to device address mappings can change over the duration | ||||
| of server operation without recalling or revoking the layouts that | ||||
| refer to device ID. The notification mechanism can also delete | ||||
| a device ID, but only if the client has no layouts referring | ||||
| to the device ID. | ||||
| A notification of a change to a device ID to device address | ||||
| mapping will immediately or eventually invalidate some or all of | ||||
| the device ID's mappings. | ||||
| The server <bcp14>MUST</bcp14> support notifications and the client must | ||||
| request them before they can be used. For further information | ||||
| about the notification types, see <xref target="OP_CB_NOTIFY_DEVICEID" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="pnfs_ops" numbered="true" toc="default"> | ||||
| <name>pNFS Operations</name> | ||||
| <t> | ||||
| NFSv4.1 has several operations that are needed for | ||||
| pNFS servers, regardless of layout type or storage | ||||
| protocol. These operations are all sent to a metadata | ||||
| server and summarized here. While pNFS is an <bcp14>OPTIONAL</bcp14> | ||||
| feature, if pNFS is implemented, some operations | ||||
| are <bcp14>REQUIRED</bcp14> in order to comply with pNFS. See <xref target="operation_mandlist" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| These are the fore channel pNFS operations: | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>GETDEVICEINFO</dt> | ||||
| <dd> | ||||
| (<xref target="OP_GETDEVICEINFO" format="default"/>), as noted previously | ||||
| (<xref target="device_ids" format="default"/>), returns the mapping of device ID to | ||||
| storage device address. | ||||
| </dd> | ||||
| <dt>GETDEVICELIST</dt> | ||||
| <dd> | ||||
| (<xref target="OP_GETDEVICELIST" format="default"/>) | ||||
| allows clients to fetch all device IDs | ||||
| for a specific file system. | ||||
| </dd> | ||||
| <dt>LAYOUTGET</dt> | ||||
| <dd> | ||||
| (<xref target="OP_LAYOUTGET" format="default"/>) is used by a client to get | ||||
| a layout for a file. | ||||
| </dd> | ||||
| <dt>LAYOUTCOMMIT</dt> | ||||
| <dd> | ||||
| (<xref target="OP_LAYOUTCOMMIT" format="default"/>) is used | ||||
| to inform the metadata server of the client's intent to commit data | ||||
| that has been written to the storage device (the storage device as | ||||
| originally indicated in the return value of LAYOUTGET). | ||||
| </dd> | ||||
| <dt>LAYOUTRETURN</dt> | ||||
| <dd> | ||||
| (<xref target="OP_LAYOUTRETURN" format="default"/>) is used | ||||
| to return layouts for a file, a file system ID (FSID), or a client ID. | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| These are the backchannel pNFS operations: | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>CB_LAYOUTRECALL</dt> | ||||
| <dd> | ||||
| (<xref target="OP_CB_LAYOUTRECALL" format="default"/>) recalls | ||||
| a layout, all layouts belonging to a file system, or all | ||||
| layouts belonging to a client ID. | ||||
| </dd> | ||||
| <dt>CB_RECALL_ANY</dt> | ||||
| <dd> | ||||
| (<xref target="OP_CB_RECALL_ANY" format="default"/>) | ||||
| tells a client that it needs to return some number of recallable | ||||
| objects, including layouts, to the metadata server. | ||||
| </dd> | ||||
| <dt>CB_RECALLABLE_OBJ_AVAIL</dt> | ||||
| <dd> | ||||
| (<xref target="OP_CB_RECALLABLE_OBJ_AVAIL" format="default"/>) tells a client | ||||
| that a recallable object that it was denied (in case of | ||||
| pNFS, a layout denied by LAYOUTGET) due to resource exhaustion | ||||
| is now available. | ||||
| </dd> | ||||
| <dt>CB_NOTIFY_DEVICEID</dt> | ||||
| <dd> | ||||
| (<xref target="OP_CB_NOTIFY_DEVICEID" format="default"/>) notifies the client of | ||||
| changes to device IDs. | ||||
| </dd> | ||||
| </dl> | ||||
| </section> | ||||
| <section anchor="pnfs_attr" numbered="true" toc="default"> | ||||
| <name>pNFS Attributes</name> | ||||
| <t> | ||||
| A number of attributes specific to pNFS are listed and described in | ||||
| <xref target="pnfs_attr_full" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Layout Semantics</name> | ||||
| <section anchor="layout_semantics" numbered="true" toc="default"> | ||||
| <name>Guarantees Provided by Layouts</name> | ||||
| <t> | ||||
| Layouts grant to the client the ability to access data located at | ||||
| a storage device with the appropriate storage protocol. The client | ||||
| is guaranteed the layout will be recalled when one of two things | ||||
| occur: either a conflicting layout is requested or the state | ||||
| encapsulated by the layout becomes invalid (this can happen when | ||||
| an event directly or indirectly modifies the layout). When a layout | ||||
| is recalled and returned by the client, the client continues with | ||||
| the ability to access file data with normal NFSv4.1 operations | ||||
| through the metadata server. Only the ability to access the storage | ||||
| devices is affected. | ||||
| </t> | ||||
| <t> | ||||
| The requirement of NFSv4.1 that all user access rights <bcp14>MUST</bcp14> be | ||||
| obtained through the appropriate OPEN, LOCK, and ACCESS operations | ||||
| is not modified with the existence of layouts. Layouts are provided | ||||
| to NFSv4.1 clients, and user access still follows the rules of the | ||||
| protocol as if they did not exist. It is a requirement that for a | ||||
| client to access a storage device, a layout must be held by the | ||||
| client. If a storage device receives an I/O request for a byte-range for | ||||
| which the client does not hold a layout, the storage device <bcp14>SHOULD</bcp14> | ||||
| reject that I/O request. Note that the act of modifying a file for | ||||
| which a layout is held does not necessarily conflict with the | ||||
| holding of the layout that describes the file being modified. | ||||
| Therefore, it is the requirement of the storage protocol or layout | ||||
| type that determines the necessary behavior. For example, | ||||
| block/volume layout types require that the layout's | ||||
| iomode agree with the type of I/O being performed. | ||||
| </t> | ||||
| <t> | ||||
| Depending upon the layout type and storage protocol in use, storage | ||||
| device access permissions may be granted by LAYOUTGET and may be | ||||
| encoded within the type-specific layout. For an example of storage | ||||
| device access permissions, see an object-based protocol such as <xref target="OSD-T10" format="default"/>. If access permissions are encoded within the | ||||
| layout, the metadata server <bcp14>SHOULD</bcp14> recall the layout when those | ||||
| permissions become invalid for any reason -- for example, when a file | ||||
| becomes unwritable or inaccessible to a client. Note, clients are | ||||
| still required to perform the appropriate | ||||
| OPEN, LOCK, and ACCESS operations as described above. The degree to which it is | ||||
| possible for the client to circumvent these operations and | ||||
| the consequences of doing so must be clearly specified by the | ||||
| individual layout type specifications. In addition, these | ||||
| specifications must be clear about the requirements and | ||||
| non-requirements for the checking performed by the server. | ||||
| </t> | ||||
| <t> | ||||
| In the presence of pNFS functionality, mandatory byte-range locks <bcp14>MUST</bcp14> | ||||
| behave as they would without pNFS. Therefore, if mandatory file | ||||
| locks and layouts are provided simultaneously, the storage device | ||||
| <bcp14>MUST</bcp14> be able to enforce the mandatory byte-range locks. For example, if | ||||
| one client obtains a mandatory byte-range lock and a second client accesses the | ||||
| storage device, the storage device <bcp14>MUST</bcp14> appropriately restrict I/O | ||||
| for the range of the mandatory byte-range lock. If the storage | ||||
| device is incapable of providing this check in the presence of | ||||
| mandatory byte-range locks, then the metadata server <bcp14>MUST NOT</bcp14> grant | ||||
| layouts and mandatory byte-range locks simultaneously. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="obtaining_layout" numbered="true" toc="default"> | ||||
| <name>Getting a Layout</name> | ||||
| <t> | ||||
| A client obtains a layout with the | ||||
| LAYOUTGET operation. The metadata server | ||||
| will grant layouts of a particular type | ||||
| (e.g., block/volume, object, or file). | ||||
| The client selects an appropriate layout | ||||
| type that the server supports and the client | ||||
| is prepared to use. The layout returned to | ||||
| the client might not exactly match the | ||||
| requested byte-range as described in <xref target="OP_LAYOUTGET_DESCRIPTION" format="default"/>. As needed a client | ||||
| may send multiple LAYOUTGET operations; these might result | ||||
| in multiple overlapping, non-conflicting layouts (see | ||||
| <xref target="layout" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| In order to get a layout, the client must first have opened the file | ||||
| via the OPEN operation. When a client has no layout on a file, it | ||||
| <bcp14>MUST</bcp14> present an open stateid, a delegation stateid, or | ||||
| a byte-range lock stateid in the loga_stateid argument. A successful | ||||
| LAYOUTGET result includes a layout stateid. The first successful | ||||
| LAYOUTGET processed by the server using a non-layout stateid as an | ||||
| argument <bcp14>MUST</bcp14> have the "seqid" field of the layout stateid in the | ||||
| response set to one. Thereafter, the client <bcp14>MUST</bcp14> use a layout | ||||
| stateid (see <xref target="layout_stateid" format="default"/>) on future invocations | ||||
| of LAYOUTGET on the file, and the "seqid" <bcp14>MUST NOT</bcp14> be set to | ||||
| zero. Once the layout has been retrieved, it can be held across | ||||
| multiple OPEN and CLOSE sequences. Therefore, a client may hold a | ||||
| layout for a file that is not currently open by any user on the | ||||
| client. This allows for the caching of layouts beyond CLOSE. | ||||
| </t> | ||||
| <t> | ||||
| The storage protocol used by the client to access the data on the | ||||
| storage device is determined by the layout's type. The client is | ||||
| responsible for matching the layout type with an available method to | ||||
| interpret and use the layout. The method for this layout type | ||||
| selection is outside the scope of the pNFS functionality. | ||||
| </t> | ||||
| <t> | ||||
| Although the metadata server is in control | ||||
| of the layout for a file, the pNFS client | ||||
| can provide hints to the server when a file | ||||
| is opened or created about the preferred | ||||
| layout type and aggregation schemes. | ||||
| pNFS introduces a layout_hint attribute (<xref target="attrdef_layout_hint" format="default"/>) | ||||
| that the client can set at file creation | ||||
| time to provide a hint to the server for new | ||||
| files. Setting this attribute separately, | ||||
| after the file has been created might make | ||||
| it difficult, or impossible, for the server | ||||
| implementation to comply. | ||||
| </t> | ||||
| <t> | ||||
| Because the EXCLUSIVE4 createmode4 does not allow the | ||||
| setting of attributes at file creation time, NFSv4.1 | ||||
| introduces the EXCLUSIVE4_1 createmode4, which does | ||||
| allow attributes to be set at file creation time. In | ||||
| addition, if the session is created with persistent | ||||
| reply caches, EXCLUSIVE4_1 is neither necessary | ||||
| nor allowed. Instead, GUARDED4 both works better and is | ||||
| prescribed. <xref target="exclusive_create" format="default"/> in <xref target="OP_OPEN_DESCRIPTION" format="default"/> summarizes how a client | ||||
| is allowed to send an exclusive create. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="layout_stateid" numbered="true" toc="default"> | ||||
| <name>Layout Stateid</name> | ||||
| <t> | ||||
| As with all other stateids, the layout stateid consists of a "seqid" and | ||||
| "other" field. Once a layout stateid is established, the "other" field | ||||
| will stay constant unless the stateid is revoked or the client | ||||
| returns all layouts on the file and the server disposes of the | ||||
| stateid. The "seqid" field is initially set to one, and is never | ||||
| zero on any NFSv4.1 operation that uses layout stateids, whether it | ||||
| is a fore channel or backchannel operation. After the layout stateid | ||||
| is established, the server increments by one the value of the | ||||
| "seqid" in each subsequent LAYOUTGET and LAYOUTRETURN response, and | ||||
| in each CB_LAYOUTRECALL request. | ||||
| </t> | ||||
| <t> | ||||
| Given the design goal of pNFS to provide parallelism, the layout | ||||
| stateid differs from other stateid types in that the client is | ||||
| expected to send LAYOUTGET and LAYOUTRETURN operations in parallel. | ||||
| The "seqid" value is used by the client to properly sort responses | ||||
| to LAYOUTGET and LAYOUTRETURN. The "seqid" is also used to prevent | ||||
| race conditions between LAYOUTGET and CB_LAYOUTRECALL. Given that the | ||||
| processing rules differ from layout stateids and other stateid | ||||
| types, only the pNFS sections of this document should be considered | ||||
| to determine proper layout stateid handling. | ||||
| </t> | ||||
| <t> | ||||
| Once the client receives a layout stateid, it <bcp14>MUST</bcp14> use the correct | ||||
| "seqid" for subsequent LAYOUTGET or LAYOUTRETURN operations. The | ||||
| correct "seqid" is defined as the highest "seqid" value from | ||||
| responses of fully processed LAYOUTGET or LAYOUTRETURN operations or | ||||
| arguments of a fully processed CB_LAYOUTRECALL operation. Since the | ||||
| server is incrementing the "seqid" value on each layout operation, | ||||
| the client may determine the order of operation processing by | ||||
| inspecting the "seqid" value. In the case of overlapping layout | ||||
| ranges, the ordering information will provide the client the | ||||
| knowledge of which layout ranges are held. Note that overlapping | ||||
| layout ranges may occur because of the client's specific requests or | ||||
| because the server is allowed to expand the range of a requested | ||||
| layout and notify the client in the LAYOUTRETURN results. Additional | ||||
| layout stateid sequencing requirements are provided in | ||||
| <xref target="pnfs_operation_sequencing" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| The client's receipt of a "seqid" is not sufficient for subsequent | ||||
| use. The client must fully process the operations before the | ||||
| "seqid" can be used. For LAYOUTGET results, if | ||||
| the client is not using the forgetful model | ||||
| (<xref target="recall_robustness" format="default"/>), it <bcp14>MUST</bcp14> first update its | ||||
| record of what ranges of the file's layout it has before using the | ||||
| seqid. For LAYOUTRETURN results, the client <bcp14>MUST</bcp14> delete the range | ||||
| from its record of what ranges of the file's layout it had before | ||||
| using the seqid. For CB_LAYOUTRECALL arguments, the client <bcp14>MUST</bcp14> send | ||||
| a response to the recall before using the seqid. | ||||
| The fundamental requirement in client | ||||
| processing is that the "seqid" is used to provide the order of | ||||
| processing. LAYOUTGET results may be processed in parallel. | ||||
| LAYOUTRETURN results may be processed in parallel. LAYOUTGET and | ||||
| LAYOUTRETURN responses may be processed in parallel as long as the | ||||
| ranges do not overlap. CB_LAYOUTRECALL request processing <bcp14>MUST</bcp14> be | ||||
| processed in "seqid" order at all times. | ||||
| </t> | ||||
| <t> | ||||
| Once a client has no more layouts on a file, the layout stateid is | ||||
| no longer valid and <bcp14>MUST NOT</bcp14> be used. Any attempt to use such a | ||||
| layout stateid will result in NFS4ERR_BAD_STATEID. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="committing_layout" numbered="true" toc="default"> | ||||
| <name>Committing a Layout</name> | ||||
| <t> | ||||
| Allowing for varying storage protocol capabilities, the pNFS | ||||
| protocol does not require the metadata server and storage devices to | ||||
| have a consistent view of file attributes and data location | ||||
| mappings. Data location mapping refers to aspects such as which offsets | ||||
| store data as opposed to storing holes (see <xref target="sparse_dense" format="default"/> for a discussion). Related issues arise | ||||
| for storage protocols where a layout may hold provisionally | ||||
| allocated blocks where the allocation of those blocks does not | ||||
| survive a complete restart of both the client and server. Because | ||||
| of this inconsistency, it is necessary to resynchronize the client | ||||
| with the metadata server and its storage devices and make any | ||||
| potential changes available to other clients. This is accomplished | ||||
| by use of the LAYOUTCOMMIT operation. | ||||
| </t> | ||||
| <t> | ||||
| The LAYOUTCOMMIT operation is responsible for committing a modified | ||||
| layout to the metadata server. The data should be written | ||||
| and committed to the appropriate storage devices before the | ||||
| LAYOUTCOMMIT occurs. The | ||||
| scope of the LAYOUTCOMMIT operation depends on the storage protocol | ||||
| in use. It is important to note that the level of | ||||
| synchronization is from the point of view of the client that sent | ||||
| the LAYOUTCOMMIT. The updated state on the metadata server need | ||||
| only reflect the state as of the client's last operation previous to | ||||
| the LAYOUTCOMMIT. The metadata server is not <bcp14>REQUIRED</bcp14> to maintain a global view | ||||
| that accounts for other clients' I/O that may have occurred within | ||||
| the same time frame. | ||||
| </t> | ||||
| <t> | ||||
| For block/volume-based layouts, LAYOUTCOMMIT may require | ||||
| updating the block list that comprises the file and committing this | ||||
| layout to stable storage. For file-based layouts, synchronization of | ||||
| attributes between the metadata and storage devices, primarily the | ||||
| size attribute, is required. | ||||
| </t> | ||||
| <t> | ||||
| The control protocol is free to synchronize the attributes before | ||||
| it receives a LAYOUTCOMMIT; however, upon successful completion of a | ||||
| LAYOUTCOMMIT, state that exists on the metadata server that | ||||
| describes the file <bcp14>MUST</bcp14> be synchronized with the state that exists on the | ||||
| storage devices that comprise that file as of the client's | ||||
| last sent operation. Thus, a client that queries the size of a file | ||||
| between a WRITE to a storage device and the LAYOUTCOMMIT might observe | ||||
| a size that does not reflect the actual data written. | ||||
| </t> | ||||
| <t> | ||||
| The client <bcp14>MUST</bcp14> have a layout in order to send a LAYOUTCOMMIT operation. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>LAYOUTCOMMIT and change/time_modify</name> | ||||
| <t> | ||||
| The change and time_modify attributes may be updated | ||||
| by the server when the LAYOUTCOMMIT operation is processed. The | ||||
| reason for this is that some layout types do not support the update | ||||
| of these attributes when the storage devices process I/O operations. | ||||
| If a client has a layout with the LAYOUTIOMODE4_RW iomode on the file, | ||||
| the client <bcp14>MAY</bcp14> provide a suggested value to the server for | ||||
| time_modify within the arguments to LAYOUTCOMMIT. | ||||
| Based on the layout type, the provided value may or may not be used. | ||||
| The server should sanity-check the client-provided values | ||||
| before they are used. For example, the server should ensure that | ||||
| time does not flow backwards. The client always has the option to | ||||
| set time_modify through an explicit SETATTR operation. | ||||
| </t> | ||||
| <t> | ||||
| For some layout protocols, the storage device is able to notify the | ||||
| metadata server of the occurrence of an I/O; as a result, the | ||||
| change and time_modify attributes may be updated at | ||||
| the metadata server. For a metadata server that is capable of | ||||
| monitoring updates to the change and time_modify | ||||
| attributes, LAYOUTCOMMIT processing is not required to update the | ||||
| change attribute. In this case, the metadata server must ensure that | ||||
| no further update to the data has occurred since the last update of | ||||
| the attributes; file-based protocols may have enough information to | ||||
| make this determination or may update the change attribute upon each | ||||
| file modification. This also applies for the time_modify | ||||
| attribute. If the server implementation is able to | ||||
| determine that the file has not been modified since the last | ||||
| time_modify update, the server need not update time_modify at | ||||
| LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes | ||||
| should be visible if that file was modified since the latest | ||||
| previous LAYOUTCOMMIT or LAYOUTGET. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="general_layoutcommit" numbered="true" toc="default"> | ||||
| <name>LAYOUTCOMMIT and size</name> | ||||
| <t> | ||||
| The size of a file may be updated when the LAYOUTCOMMIT operation is | ||||
| used by the client. One of the fields in the argument to | ||||
| LAYOUTCOMMIT is loca_last_write_offset; this field indicates the | ||||
| highest byte offset written but not yet committed with the | ||||
| LAYOUTCOMMIT operation. The data type of loca_last_write_offset is | ||||
| newoffset4 and is switched on a boolean value, no_newoffset, that | ||||
| indicates if a previous write occurred or not. If no_newoffset is | ||||
| FALSE, an offset is not given. If the client has a layout with | ||||
| LAYOUTIOMODE4_RW iomode on the file, with a byte-range (denoted by the values of lo_offset and lo_length) | ||||
| that overlaps loca_last_write_offset, then the client <bcp14>MAY</bcp14> | ||||
| set no_newoffset to TRUE and provide an offset that will | ||||
| update the file size. Keep in mind that offset is not the same | ||||
| as length, though they are related. For example, a loca_last_write_offset | ||||
| value of zero means that one byte was written at offset zero, and so | ||||
| the length of the file is at least one byte. | ||||
| </t> | ||||
| <t> | ||||
| The metadata server may do one of the following: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| Update the file's size using the last write offset provided by | ||||
| the client as either the true file size or as a hint of the file | ||||
| size. If the metadata server has a method available, any new | ||||
| value for file size should be sanity-checked. For example, the | ||||
| file must not be truncated if the client presents a last write | ||||
| offset less than the file's current size. | ||||
| </li> | ||||
| <li> | ||||
| Ignore the client-provided last write offset; the metadata | ||||
| server must have sufficient knowledge from other sources to | ||||
| determine the file's size. For example, the metadata server | ||||
| queries the storage devices with the control protocol. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| The method chosen to update the file's size will depend on the | ||||
| storage device's and/or the control protocol's capabilities. For | ||||
| example, if the storage devices are block devices with no knowledge | ||||
| of file size, the metadata server must rely on the client to set the | ||||
| last write offset appropriately. | ||||
| </t> | ||||
| <t> | ||||
| The results of LAYOUTCOMMIT contain a new size value in the form of | ||||
| a newsize4 union data type. If the file's size is set as a result | ||||
| of LAYOUTCOMMIT, the metadata server must reply with the new size; | ||||
| otherwise, the new size is not provided. | ||||
| If the file size is updated, the metadata server <bcp14>SHOULD</bcp14> update the | ||||
| storage devices such that the new file size is reflected when | ||||
| LAYOUTCOMMIT processing is complete. For example, the client should | ||||
| be able to read up to the new file size. | ||||
| </t> | ||||
| <t> | ||||
| The client can extend the length of a file | ||||
| or truncate a file by sending a SETATTR operation to the metadata server | ||||
| with the size attribute specified. If the size specified is larger than | ||||
| the current size of the file, the file is "zero extended", i.e., zeros are | ||||
| implicitly added between the file's previous EOF and the new EOF. | ||||
| (In many implementations, the zero-extended byte-range | ||||
| of the file consists of unallocated | ||||
| holes in the file.) When the client writes past EOF via WRITE, | ||||
| the SETATTR operation does not need to be used. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="layoutcommit_update" numbered="true" toc="default"> | ||||
| <name>LAYOUTCOMMIT and layoutupdate</name> | ||||
| <t> | ||||
| The LAYOUTCOMMIT argument contains a loca_layoutupdate field (<xref target="OP_LAYOUTCOMMIT_ARGUMENT" format="default"/>) of data type layoutupdate4 | ||||
| (<xref target="layoutupdate4" format="default"/>). This argument is a | ||||
| layout-type-specific structure. The structure can be used to pass | ||||
| arbitrary layout-type-specific information from the client to the | ||||
| metadata server at LAYOUTCOMMIT time. For example, if using a | ||||
| block/volume layout, the client can indicate to the metadata server | ||||
| which reserved or allocated blocks the client used or did not use. | ||||
| The content of loca_layoutupdate (field lou_body) need not be the | ||||
| same layout-type-specific content returned by LAYOUTGET (<xref target="OP_LAYOUTGET_RESULT" format="default"/>) in the loc_body field of the | ||||
| lo_content field of the logr_layout field. | ||||
| The content of | ||||
| loca_layoutupdate is defined by the layout type specification and is | ||||
| opaque to LAYOUTCOMMIT. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] Layout Semantics --> | ||||
| <section anchor="recalling_layout" numbered="true" toc="default"> | ||||
| <name>Recalling a Layout</name> | ||||
| <t> | ||||
| Since a layout protects a client's access to a file via a direct | ||||
| client-storage-device path, a layout need only be recalled when it | ||||
| is semantically unable to serve this function. Typically, this | ||||
| occurs when the layout no longer encapsulates the true location of | ||||
| the file over the byte-range it represents. Any operation or | ||||
| action, such as server-driven restriping or load balancing, that | ||||
| changes the layout will result in a recall of the layout. A layout | ||||
| is recalled by the CB_LAYOUTRECALL callback operation (see <xref target="OP_CB_LAYOUTRECALL" format="default"/>) and returned with LAYOUTRETURN (see <xref target="OP_LAYOUTRETURN" format="default"/>). The CB_LAYOUTRECALL operation may | ||||
| recall a layout identified by a byte-range, all layouts | ||||
| associated with a file system ID (FSID), or all layouts associated with | ||||
| a client ID. | ||||
| <xref target="pnfs_operation_sequencing" format="default"/> discusses sequencing issues | ||||
| surrounding the getting, returning, and recalling of layouts. | ||||
| </t> | ||||
| <t> | ||||
| An iomode is also specified when recalling a layout. | ||||
| Generally, the iomode in the recall request must match the layout | ||||
| being returned; for example, a recall with an iomode of | ||||
| LAYOUTIOMODE4_RW should cause the client to only return | ||||
| LAYOUTIOMODE4_RW layouts and not LAYOUTIOMODE4_READ layouts. | ||||
| However, a special LAYOUTIOMODE4_ANY enumeration is | ||||
| defined to enable recalling a layout of any iomode; in other words, | ||||
| the client must return both LAYOUTIOMODE4_READ and LAYOUTIOMODE4_RW layouts. | ||||
| </t> | ||||
| <t> | ||||
| A REMOVE operation <bcp14>SHOULD</bcp14> cause the metadata server to recall the | ||||
| layout to prevent the client from accessing a non-existent file and | ||||
| to reclaim state stored on the client. Since a REMOVE may be delayed | ||||
| until the last close of the file has occurred, the recall may also | ||||
| be delayed until this time. After the last reference on the file | ||||
| has been released and the file has been removed, the client should | ||||
| no longer be able to perform I/O using the layout. In the case of a | ||||
| file-based layout, the data server <bcp14>SHOULD</bcp14> return NFS4ERR_STALE in | ||||
| response to any operation on the removed file. | ||||
| </t> | ||||
| <t> | ||||
| Once a layout has been returned, the client <bcp14>MUST NOT</bcp14> send I/Os to | ||||
| the storage devices for the file, byte-range, and iomode | ||||
| represented by the returned layout. If a client does send an I/O to | ||||
| a storage device for which it does not hold a layout, the storage | ||||
| device <bcp14>SHOULD</bcp14> reject the I/O. | ||||
| </t> | ||||
| <t anchor="pnfs_and_delegations"> | ||||
| Although pNFS does not alter the file data caching capabilities of | ||||
| clients, or their semantics, it recognizes that some clients may | ||||
| perform more aggressive write-behind caching to optimize the | ||||
| benefits provided by pNFS. However, write-behind caching may | ||||
| negatively affect the latency in returning a layout in response to a | ||||
| CB_LAYOUTRECALL; this is similar to file delegations and the impact | ||||
| that file data caching has on DELEGRETURN. Client implementations | ||||
| <bcp14>SHOULD</bcp14> limit the amount of unwritten data they have outstanding at | ||||
| any one time in order to prevent excessively long responses to | ||||
| CB_LAYOUTRECALL. Once a layout is recalled, a server <bcp14>MUST</bcp14> wait one | ||||
| lease period before taking further action. As soon as a lease | ||||
| period has passed, the server may choose to fence the client's access | ||||
| to the storage devices if the server perceives the client has taken | ||||
| too long to return a layout. However, just as in the case of data | ||||
| delegation and DELEGRETURN, the server may choose to wait, given that | ||||
| the client is showing forward progress on its way to returning the | ||||
| layout. This forward progress can take the form of successful | ||||
| interaction with the storage devices or of sub-portions of the layout | ||||
| being returned by the client. The server can also limit exposure to | ||||
| these problems by limiting the byte-ranges initially provided in | ||||
| the layouts and thus the amount of outstanding modified data. | ||||
| </t> | ||||
| <section anchor="recall_robustness" numbered="true" toc="default"> | ||||
| <name>Layout Recall Callback Robustness</name> | ||||
| <t> | ||||
| It has been assumed thus far that pNFS client | ||||
| state | ||||
| (layout ranges and iomode) | ||||
| for a file exactly matches that of the pNFS server for that file. | ||||
| This assumption | ||||
| leads to the implication that any callback results in a | ||||
| LAYOUTRETURN or set of LAYOUTRETURNs that exactly match the range in | ||||
| the callback, since both client and server agree about the state | ||||
| being maintained. However, it can be useful if this assumption does | ||||
| not always hold. For example: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If conflicts that require | ||||
| callbacks are very rare, and a server can use a multi-file callback | ||||
| to recover per-client resources (e.g., via an FSID recall or a | ||||
| multi-file recall within a single CB_COMPOUND), the result may be | ||||
| significantly less client-server pNFS traffic. | ||||
| </li> | ||||
| <li> | ||||
| It may be useful for servers to maintain information about | ||||
| what ranges are held by a client on a coarse-grained basis, leading | ||||
| to the server's layout ranges being beyond those actually held by | ||||
| the client. | ||||
| In the extreme, a server could manage conflicts on | ||||
| a per-file basis, only sending whole-file callbacks even though | ||||
| clients may request and be granted sub-file ranges. | ||||
| </li> | ||||
| <li> | ||||
| It may be useful for clients to "forget" details about | ||||
| what layouts and ranges the client actually has, leading | ||||
| to the server's layout ranges being beyond those that the | ||||
| client "thinks" it has. As long as the client does not | ||||
| assume it has layouts that are beyond what the server | ||||
| has granted, this is a safe practice. When a client | ||||
| forgets what ranges and layouts it has, and it receives | ||||
| a CB_LAYOUTRECALL operation, the client <bcp14>MUST</bcp14> follow up | ||||
| with a LAYOUTRETURN for what the server recalled, or | ||||
| alternatively return the NFS4ERR_NOMATCHING_LAYOUT error | ||||
| if it has no layout to return in the recalled range. | ||||
| </li> | ||||
| <li> | ||||
| In order to avoid errors, it is vital that a client not assign | ||||
| itself layout permissions beyond what the server has granted, and | ||||
| that the server not forget layout permissions that have been granted. | ||||
| On the other hand, if a | ||||
| server believes that a client holds a layout that the client | ||||
| does not know about, it is useful for the client to cleanly indicate | ||||
| completion of the requested recall either by sending a LAYOUTRETURN | ||||
| operation for the entire requested range or by returning an | ||||
| NFS4ERR_NOMATCHING_LAYOUT error to the CB_LAYOUTRECALL. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Thus, in light of the above, it is useful for a server to be able to | ||||
| send callbacks for layout ranges it has not granted to a client, | ||||
| and for a client to return ranges it does not hold. A pNFS client | ||||
| <bcp14>MUST</bcp14> always return layouts that comprise the full range | ||||
| specified by the recall. Note, the full recalled layout range need | ||||
| not be returned as part of a single operation, but may be returned | ||||
| in portions. This allows the client to stage the flushing of dirty | ||||
| data and commits and returns of layouts. | ||||
| Also, it indicates to the | ||||
| metadata server that the client is making progress. | ||||
| </t> | ||||
| <t> | ||||
| When a layout is returned, the client <bcp14>MUST NOT</bcp14> have any outstanding | ||||
| I/O requests to the storage devices involved in the layout. | ||||
| Rephrasing, the client <bcp14>MUST NOT</bcp14> return the layout while it has | ||||
| outstanding I/O requests to the storage device. | ||||
| </t> | ||||
| <t> | ||||
| Even with this requirement for the client, it is possible that I/O | ||||
| requests may be presented to a storage device no longer allowed to | ||||
| perform them. Since the server has no strict control as to when the | ||||
| client will return the layout, the server may later decide to | ||||
| unilaterally revoke the client's access to the storage devices | ||||
| as provided by the layout. In | ||||
| choosing to revoke access, the server must deal with the possibility | ||||
| of lingering I/O requests, i.e., I/O requests that are | ||||
| still in flight to | ||||
| storage devices identified by the revoked layout. | ||||
| All layout type specifications <bcp14>MUST</bcp14> define whether unilateral layout revocation by | ||||
| the metadata server is supported; if it is, the specification must | ||||
| also describe how lingering writes are processed. For example, | ||||
| storage devices identified by the revoked layout could be fenced off | ||||
| from the client that held the layout. | ||||
| </t> | ||||
| <t> | ||||
| In order to ensure client/server convergence with regard to layout state, | ||||
| the final LAYOUTRETURN operation in a sequence of LAYOUTRETURN | ||||
| operations for a particular recall <bcp14>MUST</bcp14> specify the entire range | ||||
| being recalled, echoing the recalled layout type, iomode, | ||||
| recall/return type (FILE, FSID, or ALL), and byte-range, even if | ||||
| layouts pertaining to partial ranges were previously | ||||
| returned. In addition, if the client holds no layouts that | ||||
| overlap the range being recalled, the client should return the | ||||
| NFS4ERR_NOMATCHING_LAYOUT error code to CB_LAYOUTRECALL. This | ||||
| allows the server to update its view of the client's layout state. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="pnfs_operation_sequencing" numbered="true" toc="default"> | ||||
| <name>Sequencing of Layout Operations</name> | ||||
| <t> | ||||
| As with other stateful operations, pNFS requires the correct | ||||
| sequencing of layout operations. pNFS uses the "seqid" in the | ||||
| layout stateid to provide the correct sequencing between regular | ||||
| operations and callbacks. It is the server's responsibility to | ||||
| avoid inconsistencies regarding the layouts provided and the | ||||
| client's responsibility to properly serialize its layout requests | ||||
| and layout returns. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Layout Recall and Return Sequencing</name> | ||||
| <t> | ||||
| One critical issue with regard to layout operations sequencing | ||||
| concerns callbacks. The protocol must defend against | ||||
| races between the reply to a LAYOUTGET or LAYOUTRETURN | ||||
| operation and a subsequent CB_LAYOUTRECALL. A client | ||||
| <bcp14>MUST NOT</bcp14> process a CB_LAYOUTRECALL that implies one or | ||||
| more outstanding LAYOUTGET or LAYOUTRETURN operations to | ||||
| which the client has not yet received a reply. The client | ||||
| detects such a CB_LAYOUTRECALL by examining the "seqid" | ||||
| field of the recall's layout stateid. If the "seqid" | ||||
| is not exactly one higher than what the client currently has recorded, and the | ||||
| client has at least one LAYOUTGET and/or LAYOUTRETURN operation | ||||
| outstanding, the client knows the server sent the CB_LAYOUTRECALL | ||||
| after sending a response to an outstanding LAYOUTGET or LAYOUTRETURN. | ||||
| The client <bcp14>MUST</bcp14> wait before processing such a CB_LAYOUTRECALL | ||||
| until it processes all replies for outstanding LAYOUTGET and | ||||
| LAYOUTRETURN operations for the corresponding file | ||||
| with seqid less than the seqid given by CB_LAYOUTRECALL | ||||
| (lor_stateid; see <xref target="OP_CB_LAYOUTRECALL" format="default"/>.) | ||||
| </t> | ||||
| <t> | ||||
| In addition to the seqid-based mechanism, | ||||
| <xref target="sessions_callback_races" format="default"/> | ||||
| describes the sessions mechanism for allowing the | ||||
| client to detect callback race conditions and delay processing such a | ||||
| CB_LAYOUTRECALL. The server <bcp14>MAY</bcp14> reference conflicting operations | ||||
| in the CB_SEQUENCE that precedes the CB_LAYOUTRECALL. | ||||
| Because the server has already sent replies for these operations before | ||||
| sending the callback, the replies may race with the CB_LAYOUTRECALL. | ||||
| The client <bcp14>MUST</bcp14> wait for all the referenced calls to complete and update | ||||
| its view of the layout state before processing the CB_LAYOUTRECALL. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Get/Return Sequencing</name> | ||||
| <t> | ||||
| The protocol allows the client to send concurrent | ||||
| LAYOUTGET and LAYOUTRETURN operations to the server. The | ||||
| protocol does not provide any means for the server to | ||||
| process the requests in the same order in which they | ||||
| were created. However, through the use of the "seqid" | ||||
| field in the layout stateid, the client can determine | ||||
| the order in which parallel outstanding operations were | ||||
| processed by the server. Thus, when a layout retrieved | ||||
| by an outstanding LAYOUTGET operation intersects with | ||||
| a layout returned by an outstanding LAYOUTRETURN on | ||||
| the same file, the order in which the two conflicting | ||||
| operations are processed determines the final state of | ||||
| the overlapping layout. The order is determined by | ||||
| the "seqid" returned in each operation: the operation with the | ||||
| higher seqid was executed later. | ||||
| </t> | ||||
| <t> | ||||
| It is permissible for the client to send multiple parallel | ||||
| LAYOUTGET operations for the same file or multiple parallel LAYOUTRETURN | ||||
| operations for the same file or a mix of both. | ||||
| </t> | ||||
| <t> | ||||
| It is permissible for the client to use the current stateid (see | ||||
| <xref target="current_stateid" format="default"/>) for LAYOUTGET operations, for | ||||
| example, when compounding LAYOUTGETs or compounding OPEN and | ||||
| LAYOUTGETs. It is also permissible to use the current stateid when | ||||
| compounding LAYOUTRETURNs. | ||||
| </t> | ||||
| <t> | ||||
| It is permissible for the client to use the current stateid when | ||||
| combining LAYOUTRETURN and LAYOUTGET operations for the same file in | ||||
| the same COMPOUND request since the server <bcp14>MUST</bcp14> process these in | ||||
| order. However, if a client does send such COMPOUND requests, it | ||||
| <bcp14>MUST NOT</bcp14> have more than one outstanding for the same file at the | ||||
| same time, and it <bcp14>MUST NOT</bcp14> have other LAYOUTGET or LAYOUTRETURN | ||||
| operations outstanding at the same time for that same file. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Client Considerations</name> | ||||
| <t> | ||||
| Consider a pNFS client that has sent a LAYOUTGET, and before | ||||
| it receives the reply to LAYOUTGET, it receives | ||||
| a CB_LAYOUTRECALL for the same file with an overlapping range. There are two | ||||
| possibilities, which the client can distinguish | ||||
| via the layout stateid in the recall. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| The server processed the LAYOUTGET before sending the recall, so the | ||||
| LAYOUTGET must be waited for because it | ||||
| may be carrying layout information that will need to be returned to deal | ||||
| with the CB_LAYOUTRECALL. | ||||
| </li> | ||||
| <li> | ||||
| The | ||||
| server sent the callback before receiving the | ||||
| LAYOUTGET. The server will not respond to the LAYOUTGET | ||||
| until the CB_LAYOUTRECALL is processed. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| If these possibilities cannot be distinguished, a | ||||
| deadlock could result, as the client must wait for the | ||||
| LAYOUTGET response before processing the recall in the | ||||
| first case, but that response will not arrive until after | ||||
| the recall is processed in the second case. Note that | ||||
| in the first case, the "seqid" in the layout stateid | ||||
| of the recall is two greater than what the client has | ||||
| recorded; in the second case, the "seqid" is one greater than | ||||
| what the client has recorded. This allows the client | ||||
| to disambiguate between the two cases. The client thus | ||||
| knows precisely which possibility applies. | ||||
| </t> | ||||
| <t> | ||||
| In case 1, the client knows it needs to wait for | ||||
| the LAYOUTGET response before processing the recall | ||||
| (or the client can return NFS4ERR_DELAY). | ||||
| </t> | ||||
| <t> | ||||
| In case 2, the client will not wait for the LAYOUTGET | ||||
| response before processing the recall because waiting | ||||
| would cause deadlock. Therefore, the action at the | ||||
| client will only require waiting in the case that the | ||||
| client has not yet seen the server's earlier responses | ||||
| to the LAYOUTGET operation(s). | ||||
| </t> | ||||
| <t> | ||||
| The recall process can be considered completed when | ||||
| the final LAYOUTRETURN operation for the recalled range is completed. | ||||
| The LAYOUTRETURN uses the layout stateid (with seqid) specified in | ||||
| CB_LAYOUTRECALL. If the client uses multiple LAYOUTRETURNs in | ||||
| processing the recall, the first LAYOUTRETURN will use the layout | ||||
| stateid as specified in CB_LAYOUTRECALL. Subsequent LAYOUTRETURNs | ||||
| will use the highest seqid as is the usual case. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="layout_server_consider" numbered="true" toc="default"> | ||||
| <name>Server Considerations</name> | ||||
| <t> | ||||
| Consider a race from the metadata server's point of | ||||
| view. The metadata server has sent a CB_LAYOUTRECALL and receives | ||||
| an overlapping LAYOUTGET for the same file before the | ||||
| LAYOUTRETURN(s) that respond to the CB_LAYOUTRECALL. There are | ||||
| three cases: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| The client sent the LAYOUTGET before processing the CB_LAYOUTRECALL. | ||||
| The "seqid" in the layout stateid of the arguments of LAYOUTGET is one less | ||||
| than the "seqid" in CB_LAYOUTRECALL. The server returns | ||||
| NFS4ERR_RECALLCONFLICT to the client, which indicates to the client | ||||
| that there is a pending recall. | ||||
| </li> | ||||
| <li> | ||||
| The client sent the LAYOUTGET after processing the | ||||
| CB_LAYOUTRECALL, but the LAYOUTGET arrived before the LAYOUTRETURN and | ||||
| the response to CB_LAYOUTRECALL that | ||||
| completed that processing. | ||||
| The "seqid" in the layout stateid | ||||
| of LAYOUTGET is equal to or greater than that of the "seqid" in | ||||
| CB_LAYOUTRECALL. | ||||
| The server has not received a response to the CB_LAYOUTRECALL, | ||||
| so it returns NFS4ERR_RECALLCONFLICT. | ||||
| </li> | ||||
| <li> | ||||
| The client sent the LAYOUTGET after processing the | ||||
| CB_LAYOUTRECALL; the server received the CB_LAYOUTRECALL | ||||
| response, but the LAYOUTGET arrived before the LAYOUTRETURN that | ||||
| completed that processing. | ||||
| The "seqid" in the layout stateid | ||||
| of LAYOUTGET is equal to that of the "seqid" in | ||||
| CB_LAYOUTRECALL. | ||||
| The server has received a response to the CB_LAYOUTRECALL, | ||||
| so it returns NFS4ERR_RETURNCONFLICT. | ||||
| </li> | ||||
| </ol> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Wraparound and Validation of Seqid</name> | ||||
| <t> | ||||
| The rules for layout stateid processing differ from other stateids | ||||
| in the protocol because the "seqid" value cannot be zero and the | ||||
| stateid's "seqid" value changes in a CB_LAYOUTRECALL operation. The | ||||
| non-zero requirement combined with the inherent parallelism of | ||||
| layout operations means that a set of LAYOUTGET and LAYOUTRETURN | ||||
| operations may contain the same value for "seqid". | ||||
| The server uses a slightly modified version of the modulo arithmetic | ||||
| as described in | ||||
| <xref target="Slot_Identifiers_and_Server_Reply_Cache" format="default"/> | ||||
| when incrementing the layout stateid's "seqid". The difference | ||||
| is that zero is not a valid value for "seqid"; when the value | ||||
| of a "seqid" is 0xFFFFFFFF, the next valid value will be 0x00000001. | ||||
| The modulo arithmetic is also used for the comparisons of | ||||
| "seqid" values in the processing of CB_LAYOUTRECALL events as | ||||
| described above in <xref target="layout_server_consider" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| Just as the server validates the "seqid" in the event of | ||||
| CB_LAYOUTRECALL usage, as described in | ||||
| <xref target="layout_server_consider" format="default"/>, the server also validates | ||||
| the "seqid" value to ensure that it is within an appropriate range. | ||||
| This range represents the degree of parallelism the server supports | ||||
| for layout stateids. If the client is sending multiple layout | ||||
| operations to the server in parallel, by definition, the "seqid" | ||||
| value in the supplied stateid will not be the current "seqid" as | ||||
| held by the server. The range of parallelism spans from the highest | ||||
| or current "seqid" to a "seqid" value in the past. To assist in the | ||||
| discussion, the server's current "seqid" value for a layout stateid | ||||
| is defined as SERVER_CURRENT_SEQID. The lowest "seqid" value that | ||||
| is acceptable to the server is represented by PAST_SEQID. And the | ||||
| value for the range of valid "seqid"s or range of parallelism is | ||||
| VALID_SEQID_RANGE. Therefore, the following holds: | ||||
| VALID_SEQID_RANGE = SERVER_CURRENT_SEQID - PAST_SEQID. In the | ||||
| following, all arithmetic is the modulo arithmetic as described | ||||
| above. | ||||
| </t> | ||||
| <t> | ||||
| The server <bcp14>MUST</bcp14> support a minimum VALID_SEQID_RANGE. The minimum is | ||||
| defined as: VALID_SEQID_RANGE = summation over 1..N of | ||||
| (ca_maxoperations(i) - 1), where N is the number of session fore | ||||
| channels and ca_maxoperations(i) is the value of the ca_maxoperations returned from | ||||
| CREATE_SESSION of the i'th session. The reason for "- 1" is to allow for the required | ||||
| SEQUENCE operation. The server <bcp14>MAY</bcp14> support a VALID_SEQID_RANGE | ||||
| value larger than the minimum. The maximum VALID_SEQID_RANGE is (2<sup>32</sup> - 2) (accounting for zero not being a valid "seqid" value). | ||||
| </t> | ||||
| <t> | ||||
| If the server finds the "seqid" is zero, the NFS4ERR_BAD_STATEID | ||||
| error is returned to the client. The server further validates the | ||||
| "seqid" to ensure it is within the range of parallelism, | ||||
| VALID_SEQID_RANGE. If the "seqid" value is outside of that range, | ||||
| the error NFS4ERR_OLD_STATEID is returned to the client. Upon | ||||
| receipt of NFS4ERR_OLD_STATEID, the client updates the stateid in | ||||
| the layout request based on processing of other layout requests and | ||||
| re-sends the operation to the server. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="bulk_layouts" numbered="true" toc="default"> | ||||
| <name>Bulk Recall and Return</name> | ||||
| <t> | ||||
| pNFS supports recalling and returning all layouts that | ||||
| are for files belonging to a particular fsid | ||||
| (LAYOUTRECALL4_FSID, LAYOUTRETURN4_FSID) or client ID | ||||
| (LAYOUTRECALL4_ALL, LAYOUTRETURN4_ALL). | ||||
| There are no "bulk" stateids, so detection of races | ||||
| via the seqid is not possible. | ||||
| The server <bcp14>MUST NOT</bcp14> initiate bulk recall while another | ||||
| recall is in progress, or the corresponding LAYOUTRETURN | ||||
| is in progress or pending. | ||||
| In the event the server sends a bulk recall | ||||
| while the client has a pending or in-progress LAYOUTRETURN, | ||||
| CB_LAYOUTRECALL, or LAYOUTGET, the client returns | ||||
| NFS4ERR_DELAY. In the event the client sends a LAYOUTGET | ||||
| or LAYOUTRETURN while a bulk recall is in progress, the | ||||
| server returns NFS4ERR_RECALLCONFLICT. | ||||
| If the client sends a LAYOUTGET or LAYOUTRETURN after | ||||
| the server receives NFS4ERR_DELAY from a bulk recall, | ||||
| then to ensure forward progress, the server <bcp14>MAY</bcp14> return | ||||
| NFS4ERR_RECALLCONFLICT. | ||||
| </t> | ||||
| <t> | ||||
| Once a CB_LAYOUTRECALL of LAYOUTRECALL4_ALL is sent, | ||||
| the server <bcp14>MUST NOT</bcp14> allow the client to use any layout | ||||
| stateid except for LAYOUTCOMMIT operations. Once the client receives | ||||
| a CB_LAYOUTRECALL of LAYOUTRECALL4_ALL, it <bcp14>MUST NOT</bcp14> use | ||||
| any layout stateid except for LAYOUTCOMMIT operations. | ||||
| Once a LAYOUTRETURN of LAYOUTRETURN4_ALL is sent, all | ||||
| layout stateids granted to the client ID are freed. | ||||
| The client <bcp14>MUST NOT</bcp14> use the layout stateids again. It | ||||
| <bcp14>MUST</bcp14> use LAYOUTGET to obtain new layout stateids. | ||||
| </t> | ||||
| <t> | ||||
| Once a CB_LAYOUTRECALL of LAYOUTRECALL4_FSID is sent, the | ||||
| server <bcp14>MUST NOT</bcp14> allow the client to use any layout stateid | ||||
| that refers to a file with the specified fsid except for | ||||
| LAYOUTCOMMIT operations. Once the client receives a CB_LAYOUTRECALL | ||||
| of LAYOUTRECALL4_ALL, it <bcp14>MUST NOT</bcp14> use any layout stateid | ||||
| that refers to a file with the specified fsid except | ||||
| for LAYOUTCOMMIT operations. | ||||
| Once a LAYOUTRETURN of LAYOUTRETURN4_FSID is sent, all | ||||
| layout stateids granted to the referenced fsid are freed. | ||||
| The client <bcp14>MUST NOT</bcp14> use those freed layout stateids for files | ||||
| with the referenced fsid again. Subsequently, for any file with | ||||
| the referenced fsid, to use a layout, the client <bcp14>MUST</bcp14> first | ||||
| send a LAYOUTGET operation in order to | ||||
| obtain a new layout stateid for that file. | ||||
| </t> | ||||
| <t> | ||||
| If the server has sent a bulk CB_LAYOUTRECALL and | ||||
| receives a LAYOUTGET, or a LAYOUTRETURN with a stateid, | ||||
| the server <bcp14>MUST</bcp14> return NFS4ERR_RECALLCONFLICT. If the | ||||
| server has sent a bulk CB_LAYOUTRECALL and receives a | ||||
| LAYOUTRETURN with an lr_returntype that is not equal to | ||||
| the lor_recalltype of the CB_LAYOUTRECALL, the server | ||||
| <bcp14>MUST</bcp14> return NFS4ERR_RECALLCONFLICT. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="revoke_layout" numbered="true" toc="default"> | ||||
| <name>Revoking Layouts</name> | ||||
| <t> | ||||
| Parallel NFS permits servers to revoke layouts from clients | ||||
| that fail to respond to recalls and/or fail to renew their | ||||
| lease in time. Depending on the layout type, | ||||
| the server might revoke the layout and might take certain actions | ||||
| with respect to the client's I/O to data servers. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="async_writes" numbered="true" toc="default"> | ||||
| <name>Metadata Server Write Propagation</name> | ||||
| <t> | ||||
| Asynchronous writes written through the metadata server may be | ||||
| propagated lazily to the storage devices. For data written | ||||
| asynchronously through the metadata server, a client performing a | ||||
| read at the appropriate storage device is not guaranteed to see the | ||||
| newly written data until a COMMIT occurs at the metadata server. | ||||
| While the write is pending, reads to the storage device may give out | ||||
| either the old data, the new data, or a mixture of new and old. | ||||
| Upon completion of a synchronous WRITE or COMMIT (for asynchronously | ||||
| written data), the metadata server <bcp14>MUST</bcp14> ensure that storage devices | ||||
| give out the new data and that the data has been written to stable | ||||
| storage. If the server implements its storage in any way such that | ||||
| it cannot obey these constraints, then it <bcp14>MUST</bcp14> recall the layouts to | ||||
| prevent reads being done that cannot be handled correctly. Note | ||||
| that the layouts <bcp14>MUST</bcp14> be recalled prior to the server responding to | ||||
| the associated WRITE operations. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>pNFS Mechanics</name> | ||||
| <t> | ||||
| This section describes the operations flow taken by a pNFS client | ||||
| to a metadata server and storage device. | ||||
| </t> | ||||
| <t> | ||||
| When a pNFS client encounters a new FSID, it sends a GETATTR to the | ||||
| NFSv4.1 server for the fs_layout_type (<xref target="attrdef_fs_layout_type" format="default"/>) attribute. If the attribute returns at least one layout type, | ||||
| and the layout types returned are among the set supported by | ||||
| the client, the client knows that pNFS is a possibility for the file | ||||
| system. If, from the server that returned the new FSID, the client | ||||
| does not have a client ID that came from an EXCHANGE_ID result that | ||||
| returned EXCHGID4_FLAG_USE_PNFS_MDS, it <bcp14>MUST</bcp14> send an EXCHANGE_ID to | ||||
| the server with the EXCHGID4_FLAG_USE_PNFS_MDS bit set. If the | ||||
| server's response does not have EXCHGID4_FLAG_USE_PNFS_MDS, then | ||||
| contrary to what the fs_layout_type attribute said, the server does | ||||
| not support pNFS, and the client will not be able use pNFS to that | ||||
| server; in this case, the server <bcp14>MUST</bcp14> return NFS4ERR_NOTSUPP in | ||||
| response to any pNFS operation. | ||||
| </t> | ||||
| <t> | ||||
| The client then creates a session, requesting a persistent session, so | ||||
| that exclusive creates can be done with single round trip via the | ||||
| createmode4 of GUARDED4. If the session ends up not being persistent, | ||||
| the client will use EXCLUSIVE4_1 for exclusive creates. | ||||
| </t> | ||||
| <t> | ||||
| If a file is to be created on a pNFS-enabled file | ||||
| system, the client uses the OPEN operation. With the | ||||
| normal set of attributes that may be provided upon OPEN | ||||
| used for creation, there is an <bcp14>OPTIONAL</bcp14> layout_hint | ||||
| attribute. The client's use of layout_hint allows the | ||||
| client to express its preference for a layout type and its | ||||
| associated layout details. The use of a createmode4 of | ||||
| UNCHECKED4, GUARDED4, or EXCLUSIVE4_1 will allow the | ||||
| client to provide the layout_hint attribute at create | ||||
| time. The client <bcp14>MUST NOT</bcp14> use EXCLUSIVE4 (see <xref target="exclusive_create" format="default"/>). The client is <bcp14>RECOMMENDED</bcp14> | ||||
| to combine a GETATTR operation after the OPEN within | ||||
| the same COMPOUND. The GETATTR may then retrieve | ||||
| the layout_type attribute for the newly created file. | ||||
| The client will then know what layout type the server has | ||||
| chosen for the file and therefore what storage protocol | ||||
| the client must use. | ||||
| </t> | ||||
| <t> | ||||
| If the client wants to open an existing file, then it also includes | ||||
| a GETATTR to determine what layout type the file supports. | ||||
| </t> | ||||
| <t> | ||||
| The GETATTR in either the file creation or plain file open case can | ||||
| also include the layout_blksize and layout_alignment attributes so | ||||
| that the client can determine optimal offsets and lengths for I/O on | ||||
| the file. | ||||
| </t> | ||||
| <t> | ||||
| Assuming the client supports the layout type returned by GETATTR and | ||||
| it chooses to use pNFS for data access, it then sends LAYOUTGET | ||||
| using the filehandle and stateid returned by OPEN, specifying the range it wants | ||||
| to do I/O on. The response is a layout, which may be a subset of the | ||||
| range for which the client asked. It also includes device IDs and a | ||||
| description of how data is organized (or in the case of writing, how | ||||
| data is to be organized) across the devices. The device IDs and | ||||
| data description are encoded in a format that is specific to the | ||||
| layout type, but the client is expected to understand. | ||||
| </t> | ||||
| <t> | ||||
| When the client wants to send an I/O, it determines to which device ID | ||||
| it needs to send the I/O command by examining the data | ||||
| description in the layout. It then sends a | ||||
| GETDEVICEINFO to find the device address(es) of the device ID. The | ||||
| client then sends the I/O request to one of device ID's device addresses, using the | ||||
| storage protocol defined for the layout type. | ||||
| Note that if a client has multiple I/Os to send, | ||||
| these I/O requests may be done in parallel. | ||||
| </t> | ||||
| <t> | ||||
| If the I/O was a WRITE, then at some point | ||||
| the client may want to use LAYOUTCOMMIT to | ||||
| commit the modification time and the new size | ||||
| of the file (if it believes it extended the file size) to the | ||||
| metadata server and the modified data to the file system. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="crash_recovery" numbered="true" toc="default"> | ||||
| <name>Recovery</name> | ||||
| <t> | ||||
| Recovery is complicated by the distributed nature of the pNFS | ||||
| protocol. In general, crash recovery for layouts is similar to | ||||
| crash recovery for delegations in the base NFSv4.1 protocol. However, | ||||
| the client's ability to perform I/O without contacting the metadata | ||||
| server introduces subtleties that must be handled correctly if | ||||
| the possibility of file system corruption is to be avoided. | ||||
| </t> | ||||
| <section anchor="pnfs_client_recovery" numbered="true" toc="default"> | ||||
| <name>Recovery from Client Restart</name> | ||||
| <t> | ||||
| Client recovery for layouts is similar to client recovery for other | ||||
| lock and delegation state. When a pNFS client restarts, it will lose | ||||
| all information about the layouts that it previously owned. There | ||||
| are two methods by which the server can reclaim these resources and | ||||
| allow otherwise conflicting layouts to be provided to other | ||||
| clients. | ||||
| </t> | ||||
| <t> | ||||
| The first is through the expiry of the client's lease. If the | ||||
| client recovery time is longer than the lease period, the client's | ||||
| lease will expire and the server will know that state may be | ||||
| released. For layouts, the server may release the state immediately | ||||
| upon lease expiry or it may allow the layout to persist, awaiting | ||||
| possible lease revival, as long as no other layout conflicts. | ||||
| </t> | ||||
| <t> | ||||
| The second is through the client restarting in less time than it | ||||
| takes for the lease period to expire. In such a case, the client | ||||
| will contact the server through the standard EXCHANGE_ID protocol. | ||||
| The server will find that the client's co_ownerid matches the | ||||
| co_ownerid of the previous client invocation, but that the verifier | ||||
| is different. The server uses this as a signal to release all | ||||
| layout state associated with the client's previous invocation. In | ||||
| this scenario, the data written by the client but not covered by a | ||||
| successful LAYOUTCOMMIT is in an undefined state; it may have been | ||||
| written or it may now be lost. This is acceptable behavior and it | ||||
| is the client's responsibility to use LAYOUTCOMMIT to achieve the | ||||
| desired level of stability. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="lease_expiration_client" numbered="true" toc="default"> | ||||
| <name>Dealing with Lease Expiration on the Client</name> | ||||
| <t anchor="pnfs_clnt_case1"> | ||||
| If a client believes its lease has expired, it <bcp14>MUST NOT</bcp14> send I/O | ||||
| to the storage device until it has validated its lease. The client | ||||
| can send a SEQUENCE operation to the metadata server. If the | ||||
| SEQUENCE operation is successful, but sr_status_flag has | ||||
| SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, | ||||
| SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, or | ||||
| SEQ4_STATUS_ADMIN_STATE_REVOKED set, the client <bcp14>MUST NOT</bcp14> use | ||||
| currently held layouts. The client has two | ||||
| choices to recover from the lease expiration. First, for all | ||||
| modified but uncommitted data, the client writes it to the metadata server | ||||
| using the FILE_SYNC4 flag for the WRITEs, or WRITE and | ||||
| COMMIT. Second, the client re-establishes a client ID and session with | ||||
| the server and obtains new layouts and device-ID-to-device-address | ||||
| mappings for the modified data ranges and then writes the data to the | ||||
| storage devices with the newly obtained layouts. | ||||
| </t> | ||||
| <t anchor="pnfs_clnt_case2"> | ||||
| If sr_status_flags from the metadata server has | ||||
| SEQ4_STATUS_RESTART_RECLAIM_NEEDED set | ||||
| (or SEQUENCE returns NFS4ERR_BAD_SESSION and | ||||
| CREATE_SESSION returns NFS4ERR_STALE_CLIENTID), then the metadata | ||||
| server has restarted, and the client <bcp14>SHOULD</bcp14> recover using the | ||||
| methods described in <xref target="mds_recovery" format="default"/>. | ||||
| </t> | ||||
| <t anchor="pnfs_clnt_case3"> | ||||
| If sr_status_flags from the metadata server has | ||||
| SEQ4_STATUS_LEASE_MOVED set, then the client recovers by following | ||||
| the procedure described in <xref target="transferred_lease" format="default"/>. After that, the client may get an | ||||
| indication that the layout state was not moved with the file | ||||
| system. The client recovers as in the other | ||||
| applicable situations discussed in the first two paragraphs of this section. | ||||
| </t> | ||||
| <t anchor="pnfs_clnt_case4"> | ||||
| If sr_status_flags reports no loss of state, then the lease for the | ||||
| layouts that the client has are valid and | ||||
| renewed, and the client can once again send I/O requests to the | ||||
| storage devices. | ||||
| </t> | ||||
| <t> | ||||
| While clients <bcp14>SHOULD NOT</bcp14> send I/Os to storage devices that may | ||||
| extend past the lease expiration time period, this is not always | ||||
| possible, for example, an extended network partition that starts | ||||
| after the I/O is sent and does not heal until the I/O request is | ||||
| received by the storage device. Thus, the metadata server and/or | ||||
| storage devices are responsible for protecting themselves from I/Os | ||||
| that are both sent before the lease expires and arrive after the lease | ||||
| expires. See <xref target="lease_expiration_mds" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="lease_expiration_mds" numbered="true" toc="default"> | ||||
| <name>Dealing with Loss of Layout State on the Metadata Server</name> | ||||
| <t> | ||||
| This is a description of the case where all of the following are | ||||
| true: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| the metadata server has not restarted | ||||
| </li> | ||||
| <li> | ||||
| a pNFS client's | ||||
| layouts have been discarded (usually because the client's lease | ||||
| expired) and are invalid | ||||
| </li> | ||||
| <li> | ||||
| an I/O from the pNFS client arrives at the storage device | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The metadata server and its storage devices <bcp14>MUST</bcp14> solve this by | ||||
| fencing the client. In other words, they <bcp14>MUST</bcp14> solve this by | ||||
| preventing the execution of I/O operations from the client to the | ||||
| storage devices after layout | ||||
| state loss. The details of how fencing is done are specific to the | ||||
| layout type. The solution for NFSv4.1 file-based layouts is | ||||
| described in (<xref target="file_layout_revoke" format="default"/>), and solutions for other | ||||
| layout types are in their respective external specification documents. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="mds_recovery" numbered="true" toc="default"> | ||||
| <name>Recovery from Metadata Server Restart</name> | ||||
| <t> | ||||
| The pNFS client will discover that the metadata server has | ||||
| restarted via the methods described in <xref target="server_failure" format="default"/> and discussed in a pNFS-specific | ||||
| context in <xref target="pnfs_clnt_case2" format="default"/>. The client <bcp14>MUST</bcp14> stop using | ||||
| layouts and delete the device ID to device address mappings it | ||||
| previously received from the metadata server. Having done that, | ||||
| if the client wrote data to the storage device without committing | ||||
| the layouts via LAYOUTCOMMIT, then the client has | ||||
| additional work to do in order to have the client, metadata server, | ||||
| and storage device(s) all synchronized on the state of the data. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| If the client has data still modified | ||||
| and unwritten in the client's memory, the client has only two choices. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| The client can obtain a layout via LAYOUTGET after the | ||||
| server's grace period and write the data to the storage devices. | ||||
| </li> | ||||
| <li> | ||||
| The client can WRITE that data through the metadata server using the | ||||
| WRITE (<xref target="OP_WRITE" format="default"/>) operation, and then obtain | ||||
| layouts as desired. | ||||
| </li> | ||||
| </ol> | ||||
| </li> | ||||
| <li> | ||||
| If the client asynchronously wrote data to the storage device, but | ||||
| still has a copy of the data in its memory, then it has available | ||||
| to it the recovery options listed above in the previous bullet | ||||
| point. If the metadata server is also in its grace period, the | ||||
| client has available to it the options below in the next bullet | ||||
| point. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The client does not have a copy of the data in its memory and the | ||||
| metadata server is still in its grace period. The client cannot | ||||
| use LAYOUTGET (within or outside the grace period) to reclaim a | ||||
| layout because the contents of the response from LAYOUTGET | ||||
| may not match what it had previously. The range might be | ||||
| different or the client might get the same range but the content of the | ||||
| layout might be different. Even if the content of the layout | ||||
| appears to be the same, the device IDs may map to different | ||||
| device addresses, and even if the device addresses are the same, | ||||
| the device addresses could have been assigned to a different | ||||
| storage device. The option of retrieving the data from the | ||||
| storage device and writing it to the metadata server per the | ||||
| recovery scenario described above is | ||||
| not available because, again, the mappings of range to device ID, | ||||
| device ID to device address, and device address to physical device are | ||||
| stale, and new mappings via new LAYOUTGET do not solve the problem. | ||||
| </t> | ||||
| <t> | ||||
| The only recovery option for this scenario is to send a | ||||
| LAYOUTCOMMIT in reclaim mode, which the metadata server will | ||||
| accept as long as it is in its grace period. The use of | ||||
| LAYOUTCOMMIT in reclaim mode informs the metadata server that the | ||||
| layout has changed. It is critical that the metadata server | ||||
| receive this information before its grace period ends, and thus | ||||
| before it starts allowing updates to the file system. | ||||
| </t> | ||||
| <t> | ||||
| To send LAYOUTCOMMIT in reclaim mode, the client sets the | ||||
| loca_reclaim field of the operation's arguments (<xref target="OP_LAYOUTCOMMIT_ARGUMENT" format="default"/>) to TRUE. During the metadata | ||||
| server's recovery grace period (and only during the recovery grace | ||||
| period) the metadata server is prepared to accept LAYOUTCOMMIT | ||||
| requests with the loca_reclaim field set to TRUE. | ||||
| </t> | ||||
| <t> | ||||
| When loca_reclaim is TRUE, the client is attempting to commit | ||||
| changes to the layout that occurred prior to the restart | ||||
| of the metadata server. The metadata server applies some | ||||
| consistency checks on the loca_layoutupdate field of the arguments | ||||
| to determine whether the client can commit the data written to the | ||||
| storage device to the file system. The loca_layoutupdate field is of | ||||
| data type layoutupdate4 and contains layout-type-specific content | ||||
| (in the lou_body field of loca_layoutupdate). The | ||||
| layout-type-specific information that loca_layoutupdate might have | ||||
| is discussed in <xref target="layoutcommit_update" format="default"/>. If the | ||||
| metadata server's consistency checks on loca_layoutupdate succeed, | ||||
| then the metadata server <bcp14>MUST</bcp14> commit the data (as described by the | ||||
| loca_offset, loca_length, and loca_layoutupdate fields of the | ||||
| arguments) that was written to the storage device. If the metadata | ||||
| server's consistency checks on loca_layoutupdate fail, the | ||||
| metadata server rejects the LAYOUTCOMMIT operation and makes no | ||||
| changes to the file system. However, any time LAYOUTCOMMIT with | ||||
| loca_reclaim TRUE fails, the pNFS client has lost all the data in | ||||
| the range defined by <loca_offset, loca_length>. A client | ||||
| can defend against this risk by caching all data, whether written | ||||
| synchronously or asynchronously in its memory, and by not releasing the | ||||
| cached data until a successful LAYOUTCOMMIT. This condition | ||||
| does not hold true for all layout types; for example, file-based | ||||
| storage devices need not suffer from this limitation. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| The client does not have a copy of the data in its memory and the | ||||
| metadata server is no longer in its grace period; i.e., the metadata | ||||
| server returns NFS4ERR_NO_GRACE. As with the scenario in the above | ||||
| bullet point, the failure of LAYOUTCOMMIT means the data | ||||
| in the range <loca_offset, loca_length> lost. The | ||||
| defense against the risk is the same -- cache all written data | ||||
| on the client until a successful LAYOUTCOMMIT. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="pnfs_grace_exception" numbered="true" toc="default"> | ||||
| <name>Operations during Metadata Server Grace Period</name> | ||||
| <t> | ||||
| Some of the recovery scenarios thus far noted that some | ||||
| operations (namely, WRITE and LAYOUTGET) might be permitted during | ||||
| the metadata server's grace period. The metadata server may allow | ||||
| these operations during its grace period. For LAYOUTGET, the | ||||
| metadata server must reliably determine that servicing such a | ||||
| request will not conflict with an impending LAYOUTCOMMIT reclaim | ||||
| request. For WRITE, the metadata server | ||||
| must reliably determine that servicing the request | ||||
| will not conflict with an impending OPEN or with a LOCK where the | ||||
| file has mandatory byte-range locking enabled. | ||||
| </t> | ||||
| <t> | ||||
| As mentioned previously, for expediency, | ||||
| the metadata server might reject some | ||||
| operations (namely, WRITE and LAYOUTGET) during its | ||||
| grace period, because the simplest correct approach | ||||
| is to reject all non-reclaim pNFS requests and WRITE operations by | ||||
| returning the NFS4ERR_GRACE error. However, depending on the | ||||
| storage protocol (which is specific to the layout type) and | ||||
| metadata server implementation, the metadata server may be able to | ||||
| determine that a particular request is safe. For example, a | ||||
| metadata server may save provisional allocation mappings for each | ||||
| file to stable storage, as well as information about potentially | ||||
| conflicting OPEN share modes and mandatory byte-range locks that might | ||||
| have been in effect at the time of restart, and the metadata | ||||
| server may use this information during the recovery grace period to determine that a | ||||
| WRITE request is safe. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="storage_device_recovery" numbered="true" toc="default"> | ||||
| <name>Storage Device Recovery</name> | ||||
| <t> | ||||
| Recovery from storage device restart is mostly dependent upon the layout type | ||||
| in use. However, there are a few general techniques a client can | ||||
| use if it discovers a storage device has crashed while holding | ||||
| modified, uncommitted data that was asynchronously written. | ||||
| First and foremost, it | ||||
| is important to realize that the client is the only one that has the | ||||
| information necessary to recover non-committed data since | ||||
| it holds the modified data and probably nothing else does. Second, | ||||
| the best solution is for the client to err on the side of caution | ||||
| and attempt to rewrite the modified data through another path. | ||||
| </t> | ||||
| <t> | ||||
| The client <bcp14>SHOULD</bcp14> immediately WRITE the data to the metadata server, | ||||
| with the stable field in the WRITE4args set to FILE_SYNC4. Once it | ||||
| does this, there is no need to wait for the original storage device. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Metadata and Storage Device Roles</name> | ||||
| <t> | ||||
| If the same physical hardware is used to implement both a | ||||
| metadata server and storage device, then the same hardware | ||||
| entity is to be understood to be implementing two | ||||
| distinct roles and it is important that it be clearly | ||||
| understood on behalf of which role the hardware is | ||||
| executing at any given time. | ||||
| </t> | ||||
| <t> | ||||
| Two sub-cases can be distinguished. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| The storage device uses NFSv4.1 as the storage protocol, i.e., the same | ||||
| physical hardware is used to implement both a metadata and data | ||||
| server. See <xref target="pnfs_session_stuff" format="default"/> | ||||
| for a description of how multiple roles are handled. | ||||
| </li> | ||||
| <li> | ||||
| The storage device does not use NFSv4.1 as the storage protocol, | ||||
| and the same physical hardware is used to implement both a | ||||
| metadata and storage device. Whether distinct network addresses | ||||
| are used to access the metadata server and storage device is | ||||
| immaterial. This is because it is always clear to the pNFS client and | ||||
| server, from the upper-layer protocol being used (NFSv4.1 or | ||||
| non-NFSv4.1), to which role the request to the common server network | ||||
| address is directed. | ||||
| </li> | ||||
| </ol> | ||||
| </section> | ||||
| <section anchor="security_considerations_pnfs" numbered="true" toc="default"> | ||||
| <name>Security Considerations for pNFS</name> | ||||
| <t> | ||||
| pNFS separates file system metadata and data and provides access to | ||||
| both. There are pNFS-specific operations (listed in | ||||
| <xref target="pnfs_ops" format="default"/>) that provide access to the metadata; all | ||||
| existing NFSv4.1 conventional (non-pNFS) security mechanisms and | ||||
| features apply to accessing the metadata. The combination of | ||||
| components in a pNFS system (see <xref target="fig_system" format="default"/>) is | ||||
| required to preserve the security properties of NFSv4.1 with respect | ||||
| to an entity that is accessing a storage device from a client, including | ||||
| security countermeasures to defend against threats for which NFSv4.1 | ||||
| provides defenses in environments where these threats are | ||||
| considered significant. | ||||
| </t> | ||||
| <t> | ||||
| In some cases, the security countermeasures for connections | ||||
| to storage devices may take the form of physical isolation or a | ||||
| recommendation to avoid the use of pNFS in an environment. For example, it | ||||
| may be impractical to provide confidentiality protection for some | ||||
| storage protocols to protect against eavesdropping. In | ||||
| environments where eavesdropping on such protocols is of sufficient | ||||
| concern to require countermeasures, physical isolation of the | ||||
| communication channel (e.g., via direct connection from client(s) | ||||
| to storage device(s)) and/or a decision to forgo use of pNFS (e.g., | ||||
| and fall back to conventional NFSv4.1) may be appropriate courses of action. | ||||
| </t> | ||||
| <t> | ||||
| Where communication with storage devices is subject to the same | ||||
| threats as client-to-metadata server communication, the protocols | ||||
| used for that communication need to provide security mechanisms as | ||||
| strong as or no weaker than those available via RPCSEC_GSS for | ||||
| NFSv4.1. Except for the storage protocol used for the LAYOUT4_NFSV4_1_FILES | ||||
| layout (see <xref target="file_layout_type" format="default"/>), i.e., except for NFSv4.1, | ||||
| it is beyond the scope of this document to specify the security mechanisms | ||||
| for storage access protocols. | ||||
| </t> | ||||
| <t> | ||||
| pNFS implementations <bcp14>MUST NOT</bcp14> remove NFSv4.1's access controls. | ||||
| The combination of clients, storage devices, and the metadata server | ||||
| are responsible for ensuring that all client-to-storage-device file | ||||
| data access respects NFSv4.1's ACLs and file open modes. This entails | ||||
| performing both of these checks on every access in the client, the | ||||
| storage device, or both (as applicable; when the storage device is | ||||
| an NFSv4.1 server, the storage device is ultimately responsible for | ||||
| controlling access as described in <xref target="state_propagation" format="default"/>). | ||||
| If a pNFS configuration performs these checks only in the client, | ||||
| the risk of a misbehaving client obtaining unauthorized access is | ||||
| an important consideration in determining when it is appropriate to | ||||
| use such a pNFS configuration. Such layout types <bcp14>SHOULD NOT</bcp14> be used | ||||
| when client-only access checks do not provide sufficient assurance | ||||
| that NFSv4.1 access control is being applied correctly. (This | ||||
| is not a problem for the file layout type described in <xref target="file_layout_type" format="default"/> because the storage access protocol for | ||||
| LAYOUT4_NFSV4_1_FILES is NFSv4.1, and thus the security model for | ||||
| storage device access via LAYOUT4_NFSv4_1_FILES is the same as that | ||||
| of the metadata server.) For handling of access control specific to | ||||
| a layout, the reader should examine the layout specification, such as | ||||
| the <xref target="file_layout_type" format="default">NFSv4.1/file-based layout</xref> | ||||
| of this document, the <xref target="RFC5663" format="default">blocks | ||||
| layout</xref>, and <xref target="RFC5664" format="default">objects | ||||
| layout</xref>. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="file_layout_type" numbered="true" toc="default"> | ||||
| <name>NFSv4.1 as a Storage Protocol in pNFS: the File Layout Type</name> | ||||
| <t> | ||||
| This section describes the semantics and format of NFSv4.1 file-based | ||||
| layouts for pNFS. | ||||
| NFSv4.1 file-based layouts use the LAYOUT4_NFSV4_1_FILES layout type. | ||||
| The LAYOUT4_NFSV4_1_FILES type defines | ||||
| striping data across multiple NFSv4.1 data servers. | ||||
| </t> | ||||
| <section anchor="pnfs_session_stuff" numbered="true" toc="default"> | ||||
| <name>Client ID and Session Considerations</name> | ||||
| <t> | ||||
| Sessions are a <bcp14>REQUIRED</bcp14> feature of NFSv4.1, and this | ||||
| extends to both the metadata server and file-based (NFSv4.1-based) | ||||
| data servers. | ||||
| </t> | ||||
| <t> | ||||
| The role a server plays in pNFS is determined by the result it returns | ||||
| from EXCHANGE_ID. | ||||
| The roles are: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Metadata server (EXCHGID4_FLAG_USE_PNFS_MDS is set in the result eir_flags). | ||||
| </li> | ||||
| <li> | ||||
| Data server (EXCHGID4_FLAG_USE_PNFS_DS). | ||||
| </li> | ||||
| <li> | ||||
| Non-metadata server (EXCHGID4_FLAG_USE_NON_PNFS). This is an NFSv4.1 | ||||
| server that does not support operations (e.g., | ||||
| LAYOUTGET) or attributes that pertain to pNFS. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The client <bcp14>MAY</bcp14> request zero or more of | ||||
| EXCHGID4_FLAG_USE_NON_PNFS, | ||||
| EXCHGID4_FLAG_USE_PNFS_DS, or | ||||
| EXCHGID4_FLAG_USE_PNFS_MDS, even though some combinations | ||||
| (e.g., EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_USE_PNFS_MDS) are | ||||
| contradictory. However, the server <bcp14>MUST</bcp14> only return the following | ||||
| acceptable combinations: | ||||
| </t> | ||||
| <table align="center"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Acceptable Results from EXCHANGE_ID</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left"> | ||||
| EXCHGID4_FLAG_USE_PNFS_MDS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> | ||||
| EXCHGID4_FLAG_USE_PNFS_MDS | EXCHGID4_FLAG_USE_PNFS_DS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> | ||||
| EXCHGID4_FLAG_USE_PNFS_DS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> | ||||
| EXCHGID4_FLAG_USE_NON_PNFS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> | ||||
| EXCHGID4_FLAG_USE_PNFS_DS | EXCHGID4_FLAG_USE_NON_PNFS | ||||
| </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> | ||||
| As the above table implies, a server can have one | ||||
| or two roles. A server can be both a metadata server | ||||
| and a data server, or it can be both a data server and | ||||
| non-metadata server. In addition to returning two roles | ||||
| in the EXCHANGE_ID's results, and thus serving both roles | ||||
| via a common client ID, a server can serve two roles | ||||
| by returning a unique client ID and server owner for | ||||
| each role in each of two EXCHANGE_ID results, with each | ||||
| result indicating each role. | ||||
| </t> | ||||
| <t> | ||||
| In the case of a server with concurrent pNFS roles that | ||||
| are served by a common client ID, if the EXCHANGE_ID | ||||
| request from the client has zero or a combination of the | ||||
| bits set in eia_flags, the server result should set bits | ||||
| that represent the higher of the acceptable combination | ||||
| of the server roles, with a preference to match the roles | ||||
| requested by the client. Thus, if a client request has | ||||
| (EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_USE_PNFS_MDS | ||||
| | EXCHGID4_FLAG_USE_PNFS_DS) flags set, and the server | ||||
| is both a metadata server and a data server, serving | ||||
| both the roles by a common client ID, the server | ||||
| <bcp14>SHOULD</bcp14> return with (EXCHGID4_FLAG_USE_PNFS_MDS | | ||||
| EXCHGID4_FLAG_USE_PNFS_DS) set. | ||||
| </t> | ||||
| <t> | ||||
| In the case of a server that has multiple concurrent | ||||
| pNFS roles, each role served by a unique client ID, | ||||
| if the client specifies zero or a combination of roles | ||||
| in the request, the server results <bcp14>SHOULD</bcp14> return only | ||||
| one of the roles from the combination specified by the | ||||
| client request. If the role specified by the server | ||||
| result does not match the intended use by the client, | ||||
| the client should send the EXCHANGE_ID specifying just | ||||
| the interested pNFS role. | ||||
| </t> | ||||
| <t> | ||||
| If a pNFS metadata client gets a layout that refers it to an NFSv4.1 | ||||
| data server, it needs a client ID on that data server. If it does not | ||||
| yet have a client ID from the server that had the EXCHGID4_FLAG_USE_PNFS_DS | ||||
| flag set in the EXCHANGE_ID results, then the client needs to | ||||
| send an EXCHANGE_ID to the data server, using | ||||
| the same co_ownerid as it sent to the metadata server, with the | ||||
| EXCHGID4_FLAG_USE_PNFS_DS flag set in the arguments. | ||||
| If the server's | ||||
| EXCHANGE_ID results have EXCHGID4_FLAG_USE_PNFS_DS set, then the | ||||
| client may use the client ID to create sessions that will | ||||
| exchange pNFS data operations. | ||||
| The client ID returned by the data server has no relationship with | ||||
| the client ID returned by a metadata server unless the client IDs | ||||
| are equal, and the server owners and server scopes of the data server | ||||
| and metadata server are equal. | ||||
| </t> | ||||
| <t> | ||||
| In NFSv4.1, the | ||||
| session ID in the SEQUENCE operation implies the | ||||
| client ID, which in turn might be used by the server to | ||||
| map the stateid to the right client/server pair. | ||||
| However, when a data server is presented with a READ or | ||||
| WRITE operation with a stateid, because the | ||||
| stateid is associated with a | ||||
| client ID on a metadata server, and because the session ID in | ||||
| the preceding SEQUENCE operation is tied to the | ||||
| client ID of the data server, the data server has no | ||||
| obvious way to determine the metadata server from the | ||||
| COMPOUND procedure, and thus has no way to validate the | ||||
| stateid. One <bcp14>RECOMMENDED</bcp14> approach is for pNFS servers to | ||||
| encode metadata server routing and/or identity | ||||
| information in the data server filehandles as returned | ||||
| in the layout. | ||||
| </t> | ||||
| <t> | ||||
| If metadata server routing and/or identity information is encoded | ||||
| in data server filehandles, | ||||
| when the metadata server identity or location | ||||
| changes, the data server filehandles it gave out will become | ||||
| invalid (stale), and so the metadata server <bcp14>MUST</bcp14> first | ||||
| recall the layouts. | ||||
| Invalidating a data server filehandle does not render | ||||
| the NFS client's data cache invalid. The client's cache should | ||||
| map a data server filehandle to a metadata server filehandle, and | ||||
| a metadata server filehandle to cached data. | ||||
| </t> | ||||
| <t> | ||||
| If a server is both a metadata server and a data server, | ||||
| the server might need to distinguish operations on | ||||
| files that are directed to the metadata server from | ||||
| those that are directed to the data server. It is | ||||
| <bcp14>RECOMMENDED</bcp14> that the values of the filehandles returned by | ||||
| the LAYOUTGET operation be different than the value | ||||
| of the filehandle returned by the OPEN of the same file. | ||||
| </t> | ||||
| <t> | ||||
| Another scenario is for the metadata server and the | ||||
| storage device to be distinct from one client's point of | ||||
| view, and the roles reversed from another client's point | ||||
| of view. For example, in the cluster file system model, | ||||
| a metadata server to one client might be a data server to | ||||
| another client. If NFSv4.1 is being used as the storage | ||||
| protocol, then pNFS servers need to encode the values | ||||
| of filehandles according to their specific roles. | ||||
| </t> | ||||
| <section anchor="dsonly" numbered="true" toc="default"> | ||||
| <name>Sessions Considerations for Data Servers</name> | ||||
| <t> | ||||
| <xref target="Obligations_of_the_Client" format="default"/> states | ||||
| that a client has to keep its lease renewed in | ||||
| order to prevent a session from being deleted by | ||||
| the server. If the reply to EXCHANGE_ID has just the | ||||
| EXCHGID4_FLAG_USE_PNFS_DS role set, then (as noted in | ||||
| <xref target="ds_ops" format="default"/>) the client will not be able | ||||
| to determine the data server's lease_time attribute | ||||
| because GETATTR will not be permitted. Instead, the | ||||
| rule is that any time a client receives a layout | ||||
| referring it to a data server that returns just | ||||
| the EXCHGID4_FLAG_USE_PNFS_DS role, the client <bcp14>MAY</bcp14> | ||||
| assume that the lease_time attribute from the metadata | ||||
| server that returned the layout applies to the data | ||||
| server. Thus, the data server <bcp14>MUST</bcp14> be aware of the values | ||||
| of all lease_time attributes of all metadata servers for which it | ||||
| is providing I/O, and it <bcp14>MUST</bcp14> use the maximum of all such | ||||
| lease_time values as the lease interval for all client | ||||
| IDs and sessions established on it. | ||||
| </t> | ||||
| <t> | ||||
| For example, if one metadata server has a lease_time | ||||
| attribute of 20 seconds, and a second metadata | ||||
| server has a lease_time attribute of 10 seconds, | ||||
| then if both servers return layouts that refer to an | ||||
| EXCHGID4_FLAG_USE_PNFS_DS-only data server, the data | ||||
| server <bcp14>MUST</bcp14> renew a client's lease if the interval | ||||
| between two SEQUENCE operations on different COMPOUND | ||||
| requests is less than 20 seconds. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="file_layout_definitions" numbered="true" toc="default"> | ||||
| <name>File Layout Definitions</name> | ||||
| <t> | ||||
| The following definitions apply to the LAYOUT4_NFSV4_1_FILES | ||||
| layout type and may be applicable to other layout types. | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>Unit.</dt> | ||||
| <dd> | ||||
| A unit is a fixed-size quantity of data written to a data server. | ||||
| </dd> | ||||
| <dt>Pattern.</dt> | ||||
| <dd> | ||||
| A pattern is a method of distributing one or more | ||||
| equal sized units across a set of data servers. | ||||
| A pattern is iterated one or more times. | ||||
| </dd> | ||||
| <dt>Stripe.</dt> | ||||
| <dd> | ||||
| A stripe is a set of data distributed | ||||
| across a set of data servers in a | ||||
| pattern before that pattern repeats. | ||||
| </dd> | ||||
| <dt>Stripe Count.</dt> | ||||
| <dd> | ||||
| A stripe count is the number of units in a pattern. | ||||
| </dd> | ||||
| <dt>Stripe Width.</dt> | ||||
| <dd> | ||||
| A stripe width is the size of a stripe in bytes. | ||||
| The stripe width = the stripe count * the size of the stripe unit. | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| Hereafter, this document will refer to a unit that is a written | ||||
| in a pattern as a "stripe unit". | ||||
| </t> | ||||
| <t> | ||||
| A pattern may have more stripe units than data servers. | ||||
| If so, some data servers will have more than one stripe unit | ||||
| per stripe. A data server that has multiple stripe | ||||
| units per stripe <bcp14>MAY</bcp14> store each unit in a different data file (and | ||||
| depending on the implementation, will possibly assign a unique data | ||||
| filehandle to each data file). | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "File Striping Definitions" "file_layout_definitions" --> | ||||
| <section anchor="file_data_types" numbered="true" toc="default"> | ||||
| <name>File Layout Data Types</name> | ||||
| <t> | ||||
| The high level NFSv4.1 layout types are | ||||
| nfsv4_1_file_layouthint4, | ||||
| nfsv4_1_file_layout_ds_addr4, | ||||
| and nfsv4_1_file_layout4. | ||||
| </t> | ||||
| <t> | ||||
| The SETATTR operation supports a layout hint attribute | ||||
| (<xref target="attrdef_layout_hint" format="default"/>). | ||||
| When the client sets a layout hint (data type layouthint4) with | ||||
| a layout type of LAYOUT4_NFSV4_1_FILES (the loh_type field), | ||||
| the loh_body field contains a value of data type | ||||
| nfsv4_1_file_layouthint4. | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const NFL4_UFLG_MASK = 0x0000003F; | ||||
| const NFL4_UFLG_DENSE = 0x00000001; | ||||
| const NFL4_UFLG_COMMIT_THRU_MDS = 0x00000002; | ||||
| const NFL4_UFLG_STRIPE_UNIT_SIZE_MASK | ||||
| = 0xFFFFFFC0; | ||||
| typedef uint32_t nfl_util4; | ||||
| enum filelayout_hint_care4 { | ||||
| NFLH4_CARE_DENSE = NFL4_UFLG_DENSE, | ||||
| NFLH4_CARE_COMMIT_THRU_MDS | ||||
| = NFL4_UFLG_COMMIT_THRU_MDS, | ||||
| NFLH4_CARE_STRIPE_UNIT_SIZE | ||||
| = 0x00000040, | ||||
| NFLH4_CARE_STRIPE_COUNT = 0x00000080 | ||||
| }; | ||||
| /* Encoded in the loh_body field of data type layouthint4: */ | ||||
| struct nfsv4_1_file_layouthint4 { | ||||
| uint32_t nflh_care; | ||||
| nfl_util4 nflh_util; | ||||
| count4 nflh_stripe_count; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The generic layout hint structure is described | ||||
| in <xref target="layouthint4" format="default"/>. The client uses the | ||||
| layout hint in the layout_hint (<xref target="attrdef_layout_hint" format="default"/>) attribute to indicate the preferred type | ||||
| of layout to be used for a newly created file. The | ||||
| LAYOUT4_NFSV4_1_FILES layout-type-specific content for the | ||||
| layout hint is composed of three fields. The first field, | ||||
| nflh_care, is a set of flags indicating which values of the hint the | ||||
| client cares about. If the NFLH4_CARE_DENSE flag is set, then | ||||
| the client indicates in the second field, nflh_util, | ||||
| a preference for how the data | ||||
| file is packed (<xref target="sparse_dense" format="default"/>), which is controlled | ||||
| by the value of the expression nflh_util & NFL4_UFLG_DENSE ("&" represents the bitwise AND operator). If the | ||||
| NFLH4_CARE_COMMIT_THRU_MDS flag is set, then the client indicates | ||||
| a preference for whether the client should send COMMIT operations | ||||
| to the metadata server or data server (<xref target="commit_thru_mds" format="default"/>), | ||||
| which is controlled by the value of nflh_util & NFL4_UFLG_COMMIT_THRU_MDS. | ||||
| If the NFLH4_CARE_STRIPE_UNIT_SIZE flag is set, the client indicates | ||||
| its preferred stripe unit size, which is indicated in | ||||
| nflh_util & | ||||
| NFL4_UFLG_STRIPE_UNIT_SIZE_MASK (thus, the stripe | ||||
| unit size <bcp14>MUST</bcp14> be a multiple of 64 bytes). The minimum stripe unit | ||||
| size is 64 bytes. | ||||
| If the NFLH4_CARE_STRIPE_COUNT flag is set, the client indicates | ||||
| in the third field, | ||||
| nflh_stripe_count, the stripe count. The stripe count multiplied | ||||
| by the stripe unit size is the stripe width. | ||||
| </t> | ||||
| <t> | ||||
| When LAYOUTGET returns a LAYOUT4_NFSV4_1_FILES layout | ||||
| (indicated in the loc_type field of the lo_content field), | ||||
| the loc_body field of the lo_content field | ||||
| contains a value of data type nfsv4_1_file_layout4. | ||||
| Among other content, nfsv4_1_file_layout4 has a storage | ||||
| device ID (field nfl_deviceid) of data type | ||||
| deviceid4. | ||||
| The GETDEVICEINFO operation maps a device ID to | ||||
| a storage device address (type device_addr4). When GETDEVICEINFO | ||||
| returns a device address with a layout type of LAYOUT4_NFSV4_1_FILES | ||||
| (the da_layout_type field), the da_addr_body field contains | ||||
| a value of data type nfsv4_1_file_layout_ds_addr4. | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| typedef netaddr4 multipath_list4<>; | ||||
| /* | ||||
| * Encoded in the da_addr_body field of | ||||
| * data type device_addr4: | ||||
| */ | ||||
| struct nfsv4_1_file_layout_ds_addr4 { | ||||
| uint32_t nflda_stripe_indices<>; | ||||
| multipath_list4 nflda_multipath_ds_list<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The nfsv4_1_file_layout_ds_addr4 data type represents the | ||||
| device address. It is composed of two fields: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| nflda_multipath_ds_list: An array of lists of data servers, where | ||||
| each list can be one or more elements, and each element represents | ||||
| a data server address that may serve equally as the target of I/O operations (see | ||||
| <xref target="file_multipath" format="default"/>). | ||||
| The length of this array might be different than the stripe count. | ||||
| </li> | ||||
| <li> | ||||
| nflda_stripe_indices: An array of indices used to index into | ||||
| nflda_multipath_ds_list. The value of each element of nflda_stripe_indices <bcp14>MUST</bcp14> | ||||
| be less than the number of elements in nflda_multipath_ds_list. | ||||
| Each element of nflda_multipath_ds_list <bcp14>SHOULD</bcp14> be referred to by one | ||||
| or more elements of nflda_stripe_indices. | ||||
| The number of elements in | ||||
| nflda_stripe_indices is always equal to the stripe count. | ||||
| </li> | ||||
| </ol> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* | ||||
| * Encoded in the loc_body field of | ||||
| * data type layout_content4: | ||||
| */ | ||||
| struct nfsv4_1_file_layout4 { | ||||
| deviceid4 nfl_deviceid; | ||||
| nfl_util4 nfl_util; | ||||
| uint32_t nfl_first_stripe_index; | ||||
| offset4 nfl_pattern_offset; | ||||
| nfs_fh4 nfl_fh_list<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| <t> | ||||
| The nfsv4_1_file_layout4 data type represents the layout. | ||||
| It is composed of the following fields: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| nfl_deviceid: The device ID that maps to a value of type | ||||
| nfsv4_1_file_layout_ds_addr4. | ||||
| </li> | ||||
| <li> | ||||
| nfl_util: Like the nflh_util field of data type nfsv4_1_file_layouthint4, | ||||
| a compact representation of how the data on a file | ||||
| on each data server is packed, whether the client should send | ||||
| COMMIT operations to the metadata server or data server, and the | ||||
| stripe unit size. If a server returns two or | ||||
| more overlapping layouts, each stripe unit size in | ||||
| each overlapping layout <bcp14>MUST</bcp14> be the same. | ||||
| </li> | ||||
| <li> | ||||
| nfl_first_stripe_index: The index into the first element | ||||
| of the nflda_stripe_indices array to use. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| nfl_pattern_offset: | ||||
| This field is the logical offset into the file | ||||
| where the striping pattern starts. It is required for | ||||
| converting the client's logical I/O offset (e.g., the current | ||||
| offset in a POSIX file descriptor before the read() or write() | ||||
| system call is sent) into the stripe unit number (see | ||||
| <xref target="SUi" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| If dense packing is used, then nfl_pattern_offset | ||||
| is also needed to convert the client's logical | ||||
| I/O offset to an offset on the file on the data | ||||
| server corresponding to the stripe unit number (see <xref target="sparse_dense" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| Note that nfl_pattern_offset is not always the same as | ||||
| lo_offset. For example, via the LAYOUTGET operation, | ||||
| a client might request a layout starting at offset 1000 of a | ||||
| file that has its striping pattern start at offset zero. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| nfl_fh_list: An array of data server filehandles for each | ||||
| list of data servers in each element of the nflda_multipath_ds_list | ||||
| array. The number of elements in | ||||
| nfl_fh_list depends on whether sparse or dense packing | ||||
| is being used. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| If sparse packing is being used, the number of elements in | ||||
| nfl_fh_list <bcp14>MUST</bcp14> be one of three values: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Zero. This means that filehandles used | ||||
| for each data server are the same as the | ||||
| filehandle returned by the OPEN operation | ||||
| from the metadata server. | ||||
| </li> | ||||
| <li> | ||||
| One. This means that every data server uses | ||||
| the same filehandle: what is specified in | ||||
| nfl_fh_list[0]. | ||||
| </li> | ||||
| <li> | ||||
| The same number of elements in | ||||
| nflda_multipath_ds_list. Thus, in this case, | ||||
| when sending an I/O operation to any data server in | ||||
| nflda_multipath_ds_list[X], the filehandle | ||||
| in nfl_fh_list[X] <bcp14>MUST</bcp14> be used. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| See the discussion on sparse packing in <xref target="sparse_dense" format="default"/>. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| If dense packing is being used, the number of elements | ||||
| in nfl_fh_list <bcp14>MUST</bcp14> be the same as the number | ||||
| of elements in nflda_stripe_indices. Thus, | ||||
| when sending an I/O operation to any data server in | ||||
| nflda_multipath_ds_list[nflda_stripe_indices[Y]], | ||||
| the filehandle in nfl_fh_list[Y] <bcp14>MUST</bcp14> be | ||||
| used. In addition, any time there exists i | ||||
| and j, (i != j), such that the intersection of | ||||
| nflda_multipath_ds_list[nflda_stripe_indices[i]] | ||||
| and nflda_multipath_ds_list[nflda_stripe_indices[j]] | ||||
| is not empty, then nfl_fh_list[i] <bcp14>MUST NOT</bcp14> equal | ||||
| nfl_fh_list[j]. In other words, when dense packing | ||||
| is being used, if a data server appears in two or more | ||||
| units of a striping pattern, each reference to | ||||
| the data server <bcp14>MUST</bcp14> use a different filehandle. | ||||
| </t> | ||||
| <t> | ||||
| Indeed, if there are multiple striping patterns, | ||||
| as indicated by the presence of multiple objects of | ||||
| data type layout4 (either returned in one or multiple | ||||
| LAYOUTGET operations), and a data server is the target | ||||
| of a unit of one pattern and another unit of another | ||||
| pattern, then each reference to each data server <bcp14>MUST</bcp14> | ||||
| use a different filehandle. | ||||
| </t> | ||||
| <t> | ||||
| See the discussion on dense packing in <xref target="sparse_dense" format="default"/>. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| The details on the interpretation of the layout are in | ||||
| <xref target="file_layout_interpret" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] "File Layout Data Types" "file_data_types" --> | ||||
| <section anchor="file_layout_interpret" numbered="true" toc="default"> | ||||
| <name>Interpreting the File Layout</name> | ||||
| <section anchor="SUi" numbered="true" toc="default"> | ||||
| <name>Determining the Stripe Unit Number</name> | ||||
| <t> | ||||
| To find the stripe unit number that corresponds to the client's | ||||
| logical file offset, the pattern offset will also be used. The | ||||
| i'th stripe unit (SUi) is: | ||||
| </t> | ||||
| <sourcecode type="pseudocode"><![CDATA[ | ||||
| relative_offset = file_offset - nfl_pattern_offset; | ||||
| SUi = floor(relative_offset / stripe_unit_size);]]></sourcecode> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Interpreting the File Layout Using Sparse Packing</name> | ||||
| <t> | ||||
| When sparse packing is used, the algorithm for determining the filehandle and set | ||||
| of data-server network addresses to write stripe unit i | ||||
| (SUi) to is: | ||||
| </t> | ||||
| <sourcecode type="pseudocode"><![CDATA[ | ||||
| stripe_count = number of elements in nflda_stripe_indices; | ||||
| j = (SUi + nfl_first_stripe_index) % stripe_count; | ||||
| idx = nflda_stripe_indices[j]; | ||||
| fh_count = number of elements in nfl_fh_list; | ||||
| ds_count = number of elements in nflda_multipath_ds_list; | ||||
| switch (fh_count) { | ||||
| case ds_count: | ||||
| fh = nfl_fh_list[idx]; | ||||
| break; | ||||
| case 1: | ||||
| fh = nfl_fh_list[0]; | ||||
| break; | ||||
| case 0: | ||||
| fh = filehandle returned by OPEN; | ||||
| break; | ||||
| default: | ||||
| throw a fatal exception; | ||||
| break; | ||||
| } | ||||
| address_list = nflda_multipath_ds_list[idx];]]></sourcecode> | ||||
| <t> | ||||
| The client would then select a data server from address_list, and | ||||
| send a READ or WRITE operation using the filehandle specified in fh. | ||||
| </t> | ||||
| <t> | ||||
| Consider the following example: | ||||
| </t> | ||||
| <t> | ||||
| Suppose we have a device address consisting of seven | ||||
| data servers, arranged in three equivalence (<xref target="file_multipath" format="default"/>) classes: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| { A, B, C, D }, { E }, { F, G } | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| where A through G are network addresses. | ||||
| </t> | ||||
| <t> | ||||
| Then | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nflda_multipath_ds_list<> = { A, B, C, D }, { E }, { F, G } | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| i.e., | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nflda_multipath_ds_list[0] = { A, B, C, D } | ||||
| </li> | ||||
| <li> | ||||
| nflda_multipath_ds_list[1] = { E } | ||||
| </li> | ||||
| <li> | ||||
| nflda_multipath_ds_list[2] = { F, G } | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Suppose the striping index array is: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nflda_stripe_indices<> = { 2, 0, 1, 0 } | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Now suppose the client gets a layout that has a device ID | ||||
| that maps to the above device address. The initial index contains | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nfl_first_stripe_index = 2, | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| and the filehandle list is | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nfl_fh_list = { 0x36, 0x87, 0x67 }. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If the client wants to write to SU0, the | ||||
| set of valid { network address, filehandle } combinations | ||||
| for SUi are determined by: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nfl_first_stripe_index = 2 | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| So | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| idx = nflda_stripe_indices[(0 + 2) % 4] | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| = nflda_stripe_indices[2] | ||||
| </li> | ||||
| <li> | ||||
| = 1 | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| So | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nflda_multipath_ds_list[1] = { E } | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| and | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nfl_fh_list[1] = { 0x87 } | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The client can thus write SU0 to { 0x87, { E } }. | ||||
| </t> | ||||
| <t> | ||||
| The destinations of the first 13 storage units are: | ||||
| </t> | ||||
| <!-- [rfced] We're curious why tables 9 and 10 contain blank lines? They don't | ||||
| appear in the original. We're trying to understand the best XML to use to | ||||
| format this table, and we wonder whether the breaks are necssary. | ||||
| --> | ||||
| <table align="center"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">SUi</th> | ||||
| <th align="left">filehandle</th> | ||||
| <th align="left">data servers</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">0</td> | ||||
| <td align="left">87 </td> | ||||
| <td align="left">E </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">1</td> | ||||
| <td align="left">36</td> | ||||
| <td align="left">A,B,C,D</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">2</td> | ||||
| <td align="left">67</td> | ||||
| <td align="left">F,G</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">3</td> | ||||
| <td align="left">36 </td> | ||||
| <td align="left">A,B,C,D</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"/> | ||||
| <td align="left"/> | ||||
| <td align="left"/> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">4</td> | ||||
| <td align="left">87</td> | ||||
| <td align="left">E</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">5</td> | ||||
| <td align="left">36</td> | ||||
| <td align="left">A,B,C,D</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">6</td> | ||||
| <td align="left">67</td> | ||||
| <td align="left">F,G</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">7</td> | ||||
| <td align="left">36</td> | ||||
| <td align="left">A,B,C,D</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"/> | ||||
| <td align="left"/> | ||||
| <td align="left"/> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">8</td> | ||||
| <td align="left">87</td> | ||||
| <td align="left">E</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">9</td> | ||||
| <td align="left">36</td> | ||||
| <td align="left">A,B,C,D</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">10</td> | ||||
| <td align="left">67</td> | ||||
| <td align="left">F,G</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">11</td> | ||||
| <td align="left">36</td> | ||||
| <td align="left">A,B,C,D</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"/> | ||||
| <td align="left"/> | ||||
| <td align="left"/> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">12</td> | ||||
| <td align="left">87</td> | ||||
| <td align="left">E</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Interpreting the File Layout Using Dense Packing</name> | ||||
| <t> | ||||
| When dense packing is used, the algorithm for determining the filehandle and set | ||||
| of data server network addresses to write stripe unit i (SUi) to is: | ||||
| </t> | ||||
| <sourcecode type="pseudocode"><![CDATA[ | ||||
| stripe_count = number of elements in nflda_stripe_indices; | ||||
| j = (SUi + nfl_first_stripe_index) % stripe_count; | ||||
| idx = nflda_stripe_indices[j]; | ||||
| fh_count = number of elements in nfl_fh_list; | ||||
| ds_count = number of elements in nflda_multipath_ds_list; | ||||
| switch (fh_count) { | ||||
| case stripe_count: | ||||
| fh = nfl_fh_list[j]; | ||||
| break; | ||||
| default: | ||||
| throw a fatal exception; | ||||
| break; | ||||
| } | ||||
| address_list = nflda_multipath_ds_list[idx];]]></sourcecode> | ||||
| <t> | ||||
| The client would then select a data server from address_list, and | ||||
| send a READ or WRITE operation using the filehandle specified in fh. | ||||
| </t> | ||||
| <t> | ||||
| Consider the following example (which is the same | ||||
| as the sparse packing example, except for the | ||||
| filehandle list): | ||||
| </t> | ||||
| <t> | ||||
| Suppose we have a device address consisting of seven | ||||
| data servers, arranged in three equivalence (<xref target="file_multipath" format="default"/>) classes: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| { A, B, C, D }, { E }, { F, G } | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| where A through G are network addresses. | ||||
| </t> | ||||
| <t> | ||||
| Then | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nflda_multipath_ds_list<> = { A, B, C, D }, { E }, { F, G } | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| i.e., | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nflda_multipath_ds_list[0] = { A, B, C, D } | ||||
| </li> | ||||
| <li> | ||||
| nflda_multipath_ds_list[1] = { E } | ||||
| </li> | ||||
| <li> | ||||
| nflda_multipath_ds_list[2] = { F, G } | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Suppose the striping index array is: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nflda_stripe_indices<> = { 2, 0, 1, 0 } | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Now suppose the client gets a layout that has a device ID | ||||
| that maps to the above device address. The initial index contains | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nfl_first_stripe_index = 2, | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| and | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nfl_fh_list = { 0x67, 0x37, 0x87, 0x36 }. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The interesting examples for dense packing are | ||||
| SU1 and SU3 because each stripe unit refers to the | ||||
| same data server list, yet each stripe unit <bcp14>MUST</bcp14> use a different filehandle. | ||||
| If the client wants to write to SU1, the | ||||
| set of valid { network address, filehandle } combinations | ||||
| for SUi are determined by: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> nfl_first_stripe_index = 2 </li> | ||||
| </ul> | ||||
| <t> | ||||
| So | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| <t> j = (1 + 2) % 4 = 3 </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> idx = nflda_stripe_indices[j] </li> | ||||
| <li> = nflda_stripe_indices[3] </li> | ||||
| <li> = 0 </li> | ||||
| </ul> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| So | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nflda_multipath_ds_list[0] = { A, B, C, D } | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| and | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| nfl_fh_list[3] = { 0x36 } | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The client can thus write SU1 to { 0x36, { A, B, C, D } }. | ||||
| </t> | ||||
| <t> | ||||
| For SU3, j = (3 + 2) % 4 = 1, and nflda_stripe_indices[1] = 0. | ||||
| Then nflda_multipath_ds_list[0] = { A, B, C, D }, and | ||||
| nfl_fh_list[1] = 0x37. The client can thus write SU3 to | ||||
| { 0x37, { A, B, C, D } }. | ||||
| </t> | ||||
| <t> | ||||
| The destinations of the first 13 storage units are: | ||||
| </t> | ||||
| <table align="center"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">SUi</th> | ||||
| <th align="left">filehandle</th> | ||||
| <th align="left">data servers</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">0</td> | ||||
| <td align="left"> 87 </td> | ||||
| <td align="left"> E </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">1</td> | ||||
| <td align="left">36</td> | ||||
| <td align="left">A,B,C,D</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">2</td> | ||||
| <td align="left">67</td> | ||||
| <td align="left">F,G</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">3</td> | ||||
| <td align="left">37 </td> | ||||
| <td align="left">A,B,C,D</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"/> | ||||
| <td align="left"/> | ||||
| <td align="left"/> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">4</td> | ||||
| <td align="left">87</td> | ||||
| <td align="left">E</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">5</td> | ||||
| <td align="left">36</td> | ||||
| <td align="left">A,B,C,D</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">6</td> | ||||
| <td align="left">67</td> | ||||
| <td align="left">F,G</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">7</td> | ||||
| <td align="left">37</td> | ||||
| <td align="left">A,B,C,D</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"/> | ||||
| <td align="left"/> | ||||
| <td align="left"/> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">8</td> | ||||
| <td align="left">87</td> | ||||
| <td align="left">E</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">9</td> | ||||
| <td align="left">36</td> | ||||
| <td align="left">A,B,C,D</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">10</td> | ||||
| <td align="left">67</td> | ||||
| <td align="left">F,G</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">11</td> | ||||
| <td align="left">37</td> | ||||
| <td align="left">A,B,C,D</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"/> | ||||
| <td align="left"/> | ||||
| <td align="left"/> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">12</td> | ||||
| <td align="left">87</td> | ||||
| <td align="left">E</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| <section anchor="sparse_dense" numbered="true" toc="default"> | ||||
| <name>Sparse and Dense Stripe Unit Packing</name> | ||||
| <t> | ||||
| The flag NFL4_UFLG_DENSE of the nfl_util4 data type (field nflh_util of the | ||||
| data type nfsv4_1_file_layouthint4 and field nfl_util of | ||||
| data type nfsv4_1_file_layout_ds_addr4) specifies how the data | ||||
| is packed within the | ||||
| data file on a data server. It allows for two different data | ||||
| packings: sparse and dense. The packing type determines the | ||||
| calculation that will be made to map the client-visible file offset | ||||
| to the offset within the data file located on the data server. | ||||
| </t> | ||||
| <t> | ||||
| If nfl_util & NFL4_UFLG_DENSE is zero, this means that | ||||
| sparse packing is being used. Hence, the logical offsets of the | ||||
| file as viewed by a client | ||||
| sending READs and WRITEs directly to the metadata server | ||||
| are the same offsets each data server uses when storing | ||||
| a stripe unit. The effect then, for striping patterns | ||||
| consisting of at least two stripe units, is for each | ||||
| data server file to be sparse or "holey". So for example, | ||||
| suppose there is a pattern with three stripe units, the stripe unit | ||||
| size is 4096 bytes, and there are three data servers in | ||||
| the pattern. Then, the file in data server 1 will have | ||||
| stripe units 0, 3, 6, 9, ... filled; data server 2's | ||||
| file will have stripe units 1, 4, 7, 10, ... filled; | ||||
| and data server 3's file will have stripe units 2, | ||||
| 5, 8, 11, ... filled. The unfilled stripe units of | ||||
| each file will be holes; hence, the files in each data | ||||
| server are sparse. | ||||
| </t> | ||||
| <t> | ||||
| If sparse packing is being used and a client attempts I/O to one of | ||||
| the holes, then an error <bcp14>MUST</bcp14> be | ||||
| returned by the data server. Using the above example, if data server 3 received a READ or WRITE operation for block 4, the data server | ||||
| would return NFS4ERR_PNFS_IO_HOLE. Thus, | ||||
| data servers need to understand the striping pattern in order | ||||
| to support sparse packing. | ||||
| </t> | ||||
| <t> | ||||
| If nfl_util & NFL4_UFLG_DENSE is one, this means that | ||||
| dense packing is being used, and the data server files have no holes. | ||||
| Dense packing might be selected because the data server does not | ||||
| (efficiently) support holey files or because the data server | ||||
| cannot recognize read-ahead unless there are no holes. | ||||
| If dense packing is indicated in the layout, | ||||
| the data files will be packed. Using the | ||||
| same striping pattern and stripe unit size that were used for | ||||
| the sparse packing example, the corresponding dense packing example would have | ||||
| all stripe units of all data files filled as follows: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Logical stripe units 0, 3, 6, ... of the file would live on | ||||
| stripe units 0, 1, 2, ... of the file of data server 1. | ||||
| </li> | ||||
| <li> | ||||
| Logical stripe units 1, 4, 7, ... of the file would live on | ||||
| stripe units 0, 1, 2, ... of the file of data server 2. | ||||
| </li> | ||||
| <li> | ||||
| Logical stripe units 2, 5, 8, ... of the file would live on | ||||
| stripe units 0, 1, 2, ... of the file of data server 3. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Because dense packing does not leave holes on the data servers, | ||||
| the pNFS client is allowed to write to any offset of any data file of | ||||
| any data server in the stripe. Thus, the data servers need not know | ||||
| the file's striping pattern. | ||||
| </t> | ||||
| <t> | ||||
| The calculation to determine the byte offset within the data file | ||||
| for dense data server layouts is: | ||||
| </t> | ||||
| <sourcecode type="pseudocode"><![CDATA[ | ||||
| stripe_width = stripe_unit_size * N; | ||||
| where N = number of elements in nflda_stripe_indices. | ||||
| relative_offset = file_offset - nfl_pattern_offset; | ||||
| data_file_offset = floor(relative_offset / stripe_width) | ||||
| * stripe_unit_size | ||||
| + relative_offset % stripe_unit_size]]></sourcecode> | ||||
| <t> | ||||
| If dense packing is being used, and a data server appears | ||||
| more than once in a striping pattern, then to distinguish | ||||
| one stripe unit from another, the data server <bcp14>MUST</bcp14> use a | ||||
| different filehandle. Let's suppose there are two data | ||||
| servers. Logical stripe units 0, 3, 6 are served by | ||||
| data server 1; logical stripe units 1, 4, 7 are served | ||||
| by data server 2; and logical stripe units 2, 5, 8 are | ||||
| also served by data server 2. Unless data server 2 has | ||||
| two filehandles (each referring to a different data | ||||
| file), then, for example, a write to logical stripe | ||||
| unit 1 overwrites the write to logical stripe unit 2 | ||||
| because both logical stripe units are located in the | ||||
| same stripe unit (0) of data server 2. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] "Interpreting the File Layout" anchor="file_layout_interpret" --> | ||||
| <section anchor="file_multipath" numbered="true" toc="default"> | ||||
| <name>Data Server Multipathing</name> | ||||
| <t> | ||||
| The NFSv4.1 file layout supports multipathing to | ||||
| multiple data server addresses. | ||||
| Data-server-level multipathing is used for | ||||
| bandwidth scaling via trunking (<xref target="Trunking" format="default"/>) and for higher availability of use in the case of | ||||
| a data-server failure. Multipathing allows the client | ||||
| to switch to another data server address which may be that | ||||
| of another data server that is exporting the | ||||
| same data stripe unit, without having to contact the | ||||
| metadata server for a new layout. | ||||
| </t> | ||||
| <t> | ||||
| To support data server multipathing, each element of | ||||
| the nflda_multipath_ds_list contains an array of one | ||||
| more data server network addresses. This array (data | ||||
| type multipath_list4) represents a list of data servers | ||||
| (each identified by a network address), with the possibility | ||||
| that some data servers will appear in the list multiple times. | ||||
| </t> | ||||
| <t> | ||||
| The client is free to use any of the network addresses | ||||
| as a destination to send data server requests. If some | ||||
| network addresses are less optimal paths to the data than | ||||
| others, then the MDS <bcp14>SHOULD NOT</bcp14> include those network | ||||
| addresses in an element of nflda_multipath_ds_list. If | ||||
| less optimal network addresses exist to provide failover, the | ||||
| <bcp14>RECOMMENDED</bcp14> method to offer the addresses is | ||||
| to provide them in a replacement device-ID-to-device-address | ||||
| mapping, or a replacement device ID. When | ||||
| a client finds that no data server in an element of | ||||
| nflda_multipath_ds_list responds, it <bcp14>SHOULD</bcp14> send a | ||||
| GETDEVICEINFO to attempt to replace the existing | ||||
| device-ID-to-device-address mappings. If the MDS detects | ||||
| that all data servers represented by an element of | ||||
| nflda_multipath_ds_list are unavailable, the MDS <bcp14>SHOULD</bcp14> | ||||
| send a CB_NOTIFY_DEVICEID (if the client has indicated | ||||
| it wants device ID notifications for changed device IDs) | ||||
| to change the device-ID-to-device-address mappings to | ||||
| the available data servers. If the device ID itself will | ||||
| be replaced, the MDS <bcp14>SHOULD</bcp14> recall all layouts with the | ||||
| device ID, and thus force the client to get new layouts | ||||
| and device ID mappings via LAYOUTGET and GETDEVICEINFO. | ||||
| </t> | ||||
| <t> | ||||
| Generally, if two network addresses appear in an element | ||||
| of nflda_multipath_ds_list, they will designate the same | ||||
| data server, and the two data server addresses will | ||||
| support the implementation of | ||||
| client ID or session trunking (the latter is <bcp14>RECOMMENDED</bcp14>) | ||||
| as defined in <xref target="Trunking" format="default"/>. The two | ||||
| data server addresses will share the same server owner | ||||
| or major ID of the server owner. It is not always necessary for the | ||||
| two data server addresses to designate the same server | ||||
| with trunking being used. For example, | ||||
| the data could be read-only, and the data consist of | ||||
| exact replicas. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="ds_ops" numbered="true" toc="default"> | ||||
| <name>Operations Sent to NFSv4.1 Data Servers</name> | ||||
| <t> | ||||
| Clients accessing data on an NFSv4.1 data server <bcp14>MUST</bcp14> send | ||||
| only the NULL procedure and COMPOUND procedures whose | ||||
| operations are taken only from two restricted | ||||
| subsets of the operations defined as valid NFSv4.1 | ||||
| operations. Clients <bcp14>MUST</bcp14> use the filehandle specified | ||||
| by the layout when accessing data on NFSv4.1 data | ||||
| servers. | ||||
| </t> | ||||
| <t> | ||||
| The first of these operation subsets consists of management operations. | ||||
| This subset consists of the BACKCHANNEL_CTL, BIND_CONN_TO_SESSION, CREATE_SESSION, | ||||
| DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID, | ||||
| SECINFO_NO_NAME, SET_SSV, and SEQUENCE operations. | ||||
| The client may use these operations in order to set | ||||
| up and maintain the appropriate client IDs, | ||||
| sessions, and security contexts involved in communication with the data | ||||
| server. Henceforth, these will be referred to as | ||||
| data-server housekeeping operations. | ||||
| </t> | ||||
| <t> | ||||
| The second subset consists of COMMIT, READ, WRITE, and PUTFH. | ||||
| These operations <bcp14>MUST</bcp14> be used with a current filehandle specified by the | ||||
| layout. In the case of PUTFH, the new current filehandle <bcp14>MUST</bcp14> be | ||||
| one taken from the layout. Henceforth, these will be referred to as data-server | ||||
| I/O operations. As described in <xref target="layout_semantics" format="default"/>, | ||||
| a client <bcp14>MUST NOT</bcp14> send an I/O to a data server for which it does not hold a | ||||
| valid layout; the data server <bcp14>MUST</bcp14> reject such an I/O. | ||||
| </t> | ||||
| <t> | ||||
| Unless the server has a concurrent non-data-server | ||||
| personality -- i.e., EXCHANGE_ID results returned | ||||
| (EXCHGID4_FLAG_USE_PNFS_DS | EXCHGID4_FLAG_USE_PNFS_MDS) | ||||
| or (EXCHGID4_FLAG_USE_PNFS_DS | EXCHGID4_FLAG_USE_NON_PNFS) see | ||||
| <xref target="pnfs_session_stuff" format="default"/> -- any attempted use of | ||||
| operations against a data server other than those specified in the two | ||||
| subsets above <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_NOTSUPP to the client. | ||||
| </t> | ||||
| <t> | ||||
| When the server has concurrent data-server and | ||||
| non-data-server personalities, each COMPOUND sent by the | ||||
| client <bcp14>MUST</bcp14> be constructed | ||||
| so that it is appropriate to one of the two personalities, and it | ||||
| <bcp14>MUST NOT</bcp14> contain operations directed to a mix of those | ||||
| personalities. The server <bcp14>MUST</bcp14> enforce this. To understand | ||||
| the constraints, operations within a COMPOUND are divided into | ||||
| the following three classes: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| An operation that is ambiguous regarding its personality | ||||
| assignment. This includes all of the data-server | ||||
| housekeeping operations. Additionally, if the | ||||
| server has assigned filehandles so that the ones defined | ||||
| by the layout are the same as those used by the metadata | ||||
| server, all operations using such filehandles are within this | ||||
| class, with the following exception. The exception is | ||||
| that if the operation uses a stateid that is incompatible with a | ||||
| data-server personality (e.g., a special stateid or the | ||||
| stateid has a non-zero "seqid" field, see | ||||
| <xref target="global_stateid" format="default"/>), the operation is in class 3, | ||||
| as described below. A COMPOUND containing | ||||
| multiple class 1 operations (and operations of no other | ||||
| class) <bcp14>MAY</bcp14> be sent to a server with multiple concurrent data server | ||||
| and non-data-server personalities. | ||||
| </li> | ||||
| <li> | ||||
| An operation that is unambiguously referable to the data-server | ||||
| personality. This includes data-server I/O operations where the | ||||
| filehandle is one that can only be validly directed to the | ||||
| data-server personality. | ||||
| </li> | ||||
| <li> | ||||
| An operation that is unambiguously referable to the non-data-server | ||||
| personality. This includes all COMPOUND operations that are | ||||
| neither data-server housekeeping nor data-server I/O | ||||
| operations, plus data-server I/O operations where the | ||||
| current fh (or the one to be made the current fh in the | ||||
| case of PUTFH) is only valid on the metadata | ||||
| server or where a stateid is used that is incompatible | ||||
| with the data server, i.e., is a special stateid or has | ||||
| a non-zero seqid value. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| When a COMPOUND first executes an operation from class 3 above, | ||||
| it acts as a normal COMPOUND on any other server, and the | ||||
| data-server personality ceases to be relevant. | ||||
| There are no special restrictions on the | ||||
| operations in the COMPOUND to limit them to those for | ||||
| a data server. When a PUTFH is done, filehandles | ||||
| derived from the layout are not valid. If their format | ||||
| is not normally acceptable, then NFS4ERR_BADHANDLE <bcp14>MUST</bcp14> | ||||
| result. Similarly, current filehandles for other operations | ||||
| do not accept filehandles derived from layouts and are not | ||||
| normally usable on the metadata server. Using these | ||||
| will result in NFS4ERR_STALE. | ||||
| </t> | ||||
| <t> | ||||
| When a COMPOUND first executes an operation from class 2, | ||||
| which would be PUTFH where the filehandle | ||||
| is one from a layout, the COMPOUND henceforth is interpreted | ||||
| with respect to the data-server personality. | ||||
| Operations outside the two classes discussed | ||||
| above <bcp14>MUST</bcp14> result in NFS4ERR_NOTSUPP. Filehandles | ||||
| are validated using the rules of the data server, | ||||
| resulting in NFS4ERR_BADHANDLE and/or NFS4ERR_STALE | ||||
| even when they would not normally do so when addressed | ||||
| to the non-data-server personality. Stateids must obey | ||||
| the rules of the data server in that any use of special | ||||
| stateids or stateids with non-zero seqid values must | ||||
| result in NFS4ERR_BAD_STATEID. | ||||
| </t> | ||||
| <t> | ||||
| Until the server first executes an operation from class 2 | ||||
| or class 3, the client <bcp14>MUST NOT</bcp14> depend on the operation | ||||
| being executed by either the data-server or the non-data-server | ||||
| personality. The server <bcp14>MUST</bcp14> pick one personality consistently | ||||
| for a given COMPOUND, with the only possible transition being | ||||
| a single one when the first operation from class 2 or class 3 | ||||
| is executed. | ||||
| </t> | ||||
| <t> | ||||
| Because of the complexity induced by assigning filehandles so | ||||
| they can be used on both a data server and a metadata server, it | ||||
| is <bcp14>RECOMMENDED</bcp14> that where the same server can have both | ||||
| personalities, the server assign separate unique filehandles | ||||
| to both personalities. This makes it unambiguous for which server | ||||
| a given request is intended. | ||||
| </t> | ||||
| <t> | ||||
| GETATTR and SETATTR <bcp14>MUST</bcp14> be directed to the metadata | ||||
| server. In the case of a SETATTR of the size attribute, | ||||
| the control protocol is responsible for propagating size | ||||
| updates/truncations to the data servers. In the case of | ||||
| extending WRITEs to the data servers, the new size must | ||||
| be visible on the metadata server once a LAYOUTCOMMIT | ||||
| has completed (see <xref target="general_layoutcommit" format="default"/>). <xref target="component_file_size" format="default"/> describes the | ||||
| mechanism by which the client is to handle data-server | ||||
| files that do not reflect the metadata server's size. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="commit_thru_mds" numbered="true" toc="default"> | ||||
| <name>COMMIT through Metadata Server</name> | ||||
| <t> | ||||
| The file layout provides two alternate means of providing for the | ||||
| commit of data written through data servers. The flag | ||||
| NFL4_UFLG_COMMIT_THRU_MDS in the field nfl_util of the file layout | ||||
| (data type nfsv4_1_file_layout4) | ||||
| is an indication | ||||
| from the metadata server to the client of the <bcp14>REQUIRED</bcp14> way of | ||||
| performing COMMIT, either by sending the COMMIT to the data server | ||||
| or the metadata server. These two methods of dealing with the issue | ||||
| correspond to broad styles of implementation for a pNFS server | ||||
| supporting the file layout type. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| When the flag is FALSE, COMMIT operations <bcp14>MUST</bcp14> to be sent | ||||
| to the data server to which the corresponding WRITE operations were | ||||
| sent. This approach | ||||
| is sometimes useful when file striping is implemented within the | ||||
| pNFS server (instead of the file system), | ||||
| with the individual data servers each implementing | ||||
| their own file systems. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| When the flag is TRUE, COMMIT operations <bcp14>MUST</bcp14> be sent to the | ||||
| metadata server, rather than to the individual data servers. | ||||
| This approach is sometimes useful when file striping | ||||
| is implemented within the clustered file system that is the backend | ||||
| to the pNFS server. In such | ||||
| an implementation, each COMMIT to each | ||||
| data server might result in repeated writes of metadata | ||||
| blocks to the | ||||
| detriment of write performance. Sending a single COMMIT | ||||
| to the metadata server can be more efficient | ||||
| when there exists a clustered file | ||||
| system capable of implementing such a coordinated COMMIT. | ||||
| </t> | ||||
| <t> | ||||
| If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS is TRUE, | ||||
| then in order to maintain the current NFSv4.1 commit and | ||||
| recovery model, the data servers <bcp14>MUST</bcp14> return a common | ||||
| writeverf verifier in all WRITE responses for a given file | ||||
| layout, and the metadata server's COMMIT implementation | ||||
| must return the same writeverf. The value of the | ||||
| writeverf verifier <bcp14>MUST</bcp14> be changed at the metadata server | ||||
| or any data server that is referenced in the layout, | ||||
| whenever there is a server event that can possibly lead to | ||||
| loss of uncommitted data. The scope of the verifier can | ||||
| be for a file or for the entire pNFS server. It might be | ||||
| more difficult for the server to maintain the verifier | ||||
| at the file level, but the benefit is that only events | ||||
| that impact a given file will require recovery action. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Note that if the layout specified dense packing, then the | ||||
| offset used to a COMMIT to the MDS may differ than that of | ||||
| an offset used to a COMMIT to the data server. | ||||
| </t> | ||||
| <t> | ||||
| The single COMMIT to the metadata server will return a verifier, and | ||||
| the client should compare it to all the verifiers from the WRITEs and | ||||
| fail the COMMIT if there are any mismatched verifiers. If COMMIT to the | ||||
| metadata server fails, the client should re-send WRITEs for all the | ||||
| modified data in the file. The client should treat modified data with | ||||
| a mismatched verifier | ||||
| as a WRITE failure and try to recover by resending the WRITEs to the | ||||
| original data server or using another path to that data if the layout | ||||
| has not been recalled. Alternatively, the client can obtain | ||||
| a new layout or it could rewrite the data directly to the metadata server. If | ||||
| nfl_util & NFL4_UFLG_COMMIT_THRU_MDS is FALSE, sending | ||||
| a COMMIT to the metadata server might have no effect. If | ||||
| nfl_util & NFL4_UFLG_COMMIT_THRU_MDS is FALSE, a COMMIT | ||||
| sent to the metadata server should be used only to commit data that | ||||
| was written to the metadata server. See <xref target="storage_device_recovery" format="default"/> | ||||
| for recovery options. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>The Layout Iomode</name> | ||||
| <t> | ||||
| The layout iomode need not be used by the metadata server when | ||||
| servicing NFSv4.1 file-based layouts, although in some circumstances | ||||
| it may be useful. For example, if the server implementation | ||||
| supports reading from read-only replicas or mirrors, it would be | ||||
| useful for the server to return a layout enabling the client to do | ||||
| so. As such, the client <bcp14>SHOULD</bcp14> set the iomode based on its intent | ||||
| to read or write the data. The client may default to an iomode of | ||||
| LAYOUTIOMODE4_RW. The iomode need not be checked by the | ||||
| data servers when clients perform I/O. However, the data servers | ||||
| <bcp14>SHOULD</bcp14> still validate that the client holds a valid layout | ||||
| and return an error if the client does not. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Metadata and Data Server State Coordination</name> | ||||
| <section anchor="global_stateid" numbered="true" toc="default"> | ||||
| <name>Global Stateid Requirements</name> | ||||
| <t> | ||||
| When the client sends | ||||
| I/O to a data server, the stateid used <bcp14>MUST NOT</bcp14> be a layout stateid | ||||
| as returned by LAYOUTGET or sent by CB_LAYOUTRECALL. | ||||
| Permitted stateids are based on one of the following: | ||||
| an OPEN stateid | ||||
| (the stateid field of data type OPEN4resok as returned by OPEN), | ||||
| a delegation stateid (the stateid field of data types open_read_delegation4 | ||||
| and open_write_delegation4 as returned by OPEN or WANT_DELEGATION, | ||||
| or as sent by CB_PUSH_DELEG), or a stateid returned by the LOCK or LOCKU | ||||
| operations. The stateid sent to the data server <bcp14>MUST</bcp14> be sent with | ||||
| the seqid set to zero, indicating the most current version of that | ||||
| stateid, rather than indicating a specific non-zero seqid value. In | ||||
| no case is the use of special stateid values allowed. | ||||
| </t> | ||||
| <t> | ||||
| The stateid used for I/O <bcp14>MUST</bcp14> have the same | ||||
| effect and be subject to the same validation on a data server as it | ||||
| would if the I/O was being performed on the metadata server itself | ||||
| in the absence of pNFS. This has the implication that stateids are | ||||
| globally valid on both the metadata and data servers. This | ||||
| requires the metadata server to propagate changes in LOCK and OPEN | ||||
| state to the data servers, so that the data servers can | ||||
| validate I/O accesses. This is discussed further in <xref target="state_propagation" format="default"/>. Depending on when stateids are | ||||
| propagated, the existence of a valid stateid on the data server | ||||
| may act as proof of a valid layout. | ||||
| </t> | ||||
| <t> | ||||
| Clients performing I/O operations need to select an appropriate | ||||
| stateid based on the | ||||
| locks (including opens and delegations) held by the client and | ||||
| the various types of state-owners sending the I/O requests. The | ||||
| rules for doing so when referencing data servers are somewhat | ||||
| different from those discussed in <xref target="stateid_use" format="default"/>, | ||||
| which apply when accessing metadata servers. | ||||
| </t> | ||||
| <t> | ||||
| The following rules, applied in order of decreasing priority, govern | ||||
| the selection of the appropriate stateid: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the client holds a delegation for the file in question, the | ||||
| delegation stateid should be used. | ||||
| </li> | ||||
| <li> | ||||
| Otherwise, there must be an OPEN stateid for the current | ||||
| open-owner, and that | ||||
| OPEN stateid for the open file in question is used, unless | ||||
| mandatory locking prevents that. See below. | ||||
| </li> | ||||
| <li> | ||||
| If the data server had previously responded with NFS4ERR_LOCKED | ||||
| to use of the OPEN stateid, then the client should use the | ||||
| byte-range lock stateid whenever one exists for that open file | ||||
| with the current lock-owner. | ||||
| </li> | ||||
| <li> | ||||
| Special stateids should never be used. If they are used, the data | ||||
| server <bcp14>MUST</bcp14> reject the I/O with an NFS4ERR_BAD_STATEID error. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="state_propagation" numbered="true" toc="default"> | ||||
| <name>Data Server State Propagation</name> | ||||
| <t> | ||||
| Since the metadata server, which handles byte-range lock and | ||||
| open-mode state changes as well as ACLs, might not be | ||||
| co-located with the data servers where I/O accesses | ||||
| are validated, the server implementation <bcp14>MUST</bcp14> take | ||||
| care of propagating changes of this state to the data | ||||
| servers. Once the propagation to the data servers is | ||||
| complete, the full effect of those changes <bcp14>MUST</bcp14> be in | ||||
| effect at the data servers. However, some state changes | ||||
| need not be propagated immediately, although all changes | ||||
| <bcp14>SHOULD</bcp14> be propagated promptly. These state propagations | ||||
| have an impact on the design of the control protocol, | ||||
| even though the control protocol is outside of the scope | ||||
| of this specification. Immediate propagation refers to | ||||
| the synchronous propagation of state from the metadata | ||||
| server to the data server(s); the propagation must be | ||||
| complete before returning to the client. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Lock State Propagation</name> | ||||
| <t> | ||||
| If the pNFS server supports mandatory byte-range locking, any mandatory byte-range locks | ||||
| on a file <bcp14>MUST</bcp14> be made effective at the data servers before | ||||
| the request that establishes them returns to the caller. The | ||||
| effect <bcp14>MUST</bcp14> be the same as if the mandatory byte-range lock state were | ||||
| synchronously propagated to the data servers, even though the | ||||
| details of the control protocol may avoid actual transfer of the | ||||
| state under certain circumstances. | ||||
| </t> | ||||
| <t> | ||||
| On the other hand, since | ||||
| advisory byte-range lock state is not used for checking I/O accesses at | ||||
| the data servers, there is no semantic reason for propagating | ||||
| advisory byte-range lock state to the data servers. | ||||
| Since updates to advisory locks neither confer nor remove | ||||
| privileges, these changes need not be propagated immediately, and | ||||
| may not need to be propagated promptly. The updates to advisory | ||||
| locks need only be propagated when the data server needs to | ||||
| resolve a question about a stateid. In fact, if byte-range locking | ||||
| is not mandatory (i.e., is advisory) the clients are advised to avoid | ||||
| using the byte-range lock-based stateids for I/O. The stateids returned by | ||||
| OPEN are sufficient and eliminate overhead for this kind of state | ||||
| propagation. | ||||
| </t> | ||||
| <t> | ||||
| If a client gets back an NFS4ERR_LOCKED error from a | ||||
| data server, this is an indication that mandatory byte-range | ||||
| locking is in force. The client recovers from this by | ||||
| getting a byte-range lock that covers the affected range | ||||
| and re-sends the I/O with the stateid of the byte-range lock. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Open and Deny Mode Validation</name> | ||||
| <t> | ||||
| Open and deny mode validation <bcp14>MUST</bcp14> be performed against | ||||
| the open and deny mode(s) held by the data servers. When | ||||
| access is reduced or a deny mode made more restrictive | ||||
| (because of CLOSE or OPEN_DOWNGRADE), the data server <bcp14>MUST</bcp14> | ||||
| prevent any I/Os that would be denied if performed on the | ||||
| metadata server. When access is expanded, | ||||
| the data server <bcp14>MUST</bcp14> make sure that no requests are | ||||
| subsequently rejected because of | ||||
| open or deny issues that no longer apply, given the | ||||
| previous relaxation. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>File Attributes</name> | ||||
| <t> | ||||
| Since the SETATTR operation has the ability to modify state that is | ||||
| visible on both the metadata and data servers (e.g., the size), | ||||
| care must be taken to ensure that the resultant state across the | ||||
| set of data servers is consistent, especially when truncating or | ||||
| growing the file. | ||||
| </t> | ||||
| <t> | ||||
| As described earlier, the LAYOUTCOMMIT operation is used to ensure | ||||
| that the metadata is synchronized with changes made to the data servers. For the NFSv4.1‑based data storage protocol, | ||||
| it is necessary to re-synchronize | ||||
| state such as the size attribute, and the setting of mtime/change/atime. | ||||
| See <xref target="committing_layout" format="default"/> for a full | ||||
| description of the semantics regarding LAYOUTCOMMIT and | ||||
| attribute synchronization. It should be noted that by | ||||
| using an NFSv4.1-based layout type, it is possible to | ||||
| synchronize this state before LAYOUTCOMMIT occurs. For | ||||
| example, the control protocol can be used to query the | ||||
| attributes present on the data servers. | ||||
| </t> | ||||
| <t> | ||||
| Any changes to file attributes that control authorization or | ||||
| access as reflected by ACCESS calls or READs and WRITEs on the | ||||
| metadata server, <bcp14>MUST</bcp14> be propagated to the data servers for | ||||
| enforcement on READ and WRITE I/O calls. If the changes made on the | ||||
| metadata server result in more restrictive access permissions for | ||||
| any user, those changes <bcp14>MUST</bcp14> be propagated to the data servers | ||||
| synchronously. | ||||
| </t> | ||||
| <t> | ||||
| The OPEN operation (<xref target="OP_OPEN_IMPLEMENTATION" format="default"/>) does not impose any requirement that I/O operations | ||||
| on an open file have the same credentials as the OPEN | ||||
| itself (unless EXCHGID4_FLAG_BIND_PRINC_STATEID is | ||||
| set when EXCHANGE_ID creates the client ID), and so it | ||||
| requires the server's READ and WRITE operations to | ||||
| perform appropriate access checking. Changes to ACLs | ||||
| also require new access checking by READ and WRITE on | ||||
| the server. The propagation of access-right changes due | ||||
| to changes in ACLs may be asynchronous only if the server | ||||
| implementation is able to determine that the updated | ||||
| ACL is not more restrictive for any user specified in | ||||
| the old ACL. Due to the relative infrequency of ACL | ||||
| updates, it is suggested that all changes be propagated | ||||
| synchronously. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="component_file_size" numbered="true" toc="default"> | ||||
| <name>Data Server Component File Size</name> | ||||
| <t> | ||||
| A potential problem exists when a component data file on a | ||||
| particular data server has grown past EOF; the problem exists for | ||||
| both dense and sparse layouts. Imagine the following scenario: a | ||||
| client creates a new file (size == 0) and writes to byte 131072; the | ||||
| client then seeks to the beginning of the file and reads byte 100. | ||||
| The client should receive zeroes back as a result of the READ. However, | ||||
| if the striping pattern directs the client to send the READ to | ||||
| a data server other than the one that received the | ||||
| client's original WRITE, the data server servicing the READ may | ||||
| believe that the file's size is still 0 bytes. In that event, the | ||||
| data server's READ response will contain zero bytes and an | ||||
| indication of EOF. The data server can only return zeroes if it knows that | ||||
| the file's size has been extended. This would require the immediate | ||||
| propagation of the file's size to all data servers, which is | ||||
| potentially very costly. Therefore, the client that has | ||||
| initiated the extension of the file's size <bcp14>MUST</bcp14> be prepared to deal | ||||
| with these EOF conditions. | ||||
| When the offset in the arguments to READ | ||||
| is less than the client's view of the file size, if the READ response | ||||
| indicates EOF and/or contains fewer bytes than requested, the client | ||||
| will interpret such a response as a hole in the file, and the | ||||
| NFS client will substitute zeroes for the data. | ||||
| </t> | ||||
| <t> | ||||
| The NFSv4.1 protocol only provides close-to-open file data cache | ||||
| semantics; meaning that when the file is closed, all modified data is | ||||
| written to the server. When a subsequent OPEN of the file is | ||||
| done, the change attribute is inspected for a difference from a | ||||
| cached value for the change attribute. For the case above, this means | ||||
| that a LAYOUTCOMMIT will be done at close (along with the data | ||||
| WRITEs) and will update the file's size and change attribute. Access | ||||
| from another client after that point will result in the appropriate | ||||
| size being returned. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="file_layout_revoke" numbered="true" toc="default"> | ||||
| <name>Layout Revocation and Fencing</name> | ||||
| <t> | ||||
| As described in <xref target="crash_recovery" format="default"/>, the | ||||
| layout-type-specific storage protocol is responsible | ||||
| for handling the effects of I/Os that started before | ||||
| lease expiration and extend through lease expiration. | ||||
| The LAYOUT4_NFSV4_1_FILES layout type | ||||
| can prevent all I/Os to data servers from | ||||
| being executed after lease expiration (this prevention is | ||||
| called "fencing"), without relying | ||||
| on a precise client lease timer and without requiring | ||||
| data servers to maintain lease timers. The | ||||
| LAYOUT4_NFSV4_1_FILES pNFS server has the flexibility to | ||||
| revoke individual layouts, and thus fence I/O on a per-file | ||||
| basis. | ||||
| </t> | ||||
| <t> | ||||
| In addition to lease expiration, | ||||
| the reasons a layout can be revoked include: client fails to respond to | ||||
| a CB_LAYOUTRECALL, | ||||
| the | ||||
| metadata server restarts, or administrative intervention. Regardless | ||||
| of the reason, once a client's layout has been revoked, the pNFS | ||||
| server <bcp14>MUST</bcp14> prevent the client from sending I/O for the affected file | ||||
| from and to all data servers; in other words, it <bcp14>MUST</bcp14> fence the | ||||
| client from the affected file on the data servers. | ||||
| </t> | ||||
| <t> | ||||
| Fencing works as follows. As described in <xref target="pnfs_session_stuff" format="default"/>, in COMPOUND procedure | ||||
| requests to the data server, the data filehandle provided | ||||
| by the PUTFH operation and the stateid in the READ or | ||||
| WRITE operation are used to ensure that the client has | ||||
| a valid layout for the I/O being performed; if it does | ||||
| not, the I/O is rejected with NFS4ERR_PNFS_NO_LAYOUT. | ||||
| The server can simply check the stateid and, additionally, | ||||
| make the data filehandle stale if the layout specified | ||||
| a data filehandle that is different from the metadata server's | ||||
| filehandle for the file (see the nfl_fh_list description in | ||||
| <xref target="file_data_types" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| Before the metadata server takes any action to revoke | ||||
| layout state given out by a previous instance, it must make | ||||
| sure that all layout state from that previous instance are | ||||
| invalidated at the data servers. This has the following | ||||
| implications. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The metadata server must not restripe a | ||||
| file until it has contacted all of the data servers | ||||
| to invalidate the layouts from the previous instance. | ||||
| </li> | ||||
| <li> | ||||
| The metadata server must not give out mandatory locks that conflict with | ||||
| layouts from the previous instance without either doing | ||||
| a specific layout invalidation (as it would have to do anyway) | ||||
| or doing a global data server invalidation. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="file_security_considerations" numbered="true" toc="default"> | ||||
| <name>Security Considerations for the File Layout Type</name> | ||||
| <t> | ||||
| The NFSv4.1 file layout type <bcp14>MUST</bcp14> adhere to the security | ||||
| considerations outlined in <xref target="security_considerations_pnfs" format="default"/>. NFSv4.1 data servers <bcp14>MUST</bcp14> make all of the | ||||
| required access checks on each READ or WRITE I/O as determined by | ||||
| the NFSv4.1 protocol. | ||||
| If the metadata server would deny a READ or WRITE | ||||
| operation on a file due to its ACL, mode attribute, open | ||||
| access mode, open deny mode, mandatory byte-range lock state, or any other | ||||
| attributes and state, the data server <bcp14>MUST</bcp14> also deny the | ||||
| READ or WRITE operation. This impacts the control | ||||
| protocol and the propagation of state from the metadata | ||||
| server to the data servers; see <xref target="state_propagation" format="default"/> for more details. | ||||
| </t> | ||||
| <t> | ||||
| The methods for authentication, | ||||
| integrity, and privacy for data servers based on the | ||||
| LAYOUT4_NFSV4_1_FILES layout type are the same as those used | ||||
| by metadata servers. Metadata and data servers | ||||
| use ONC RPC security flavors to | ||||
| authenticate, and SECINFO and SECINFO_NO_NAME | ||||
| to negotiate the security mechanism and services | ||||
| to be used. Thus, when using the LAYOUT4_NFSV4_1_FILES layout type, | ||||
| the impact on the RPC-based security | ||||
| model due to pNFS (as alluded to in Sections | ||||
| <xref target="rpc_and_security" format="counter"/> | ||||
| and <xref target="parallel_access" format="counter"/>) is zero. | ||||
| </t> | ||||
| <t> | ||||
| For a given file object, a metadata server | ||||
| <bcp14>MAY</bcp14> require different security parameters | ||||
| (secinfo4 value) than the data server. | ||||
| For a given file object with multiple data servers, | ||||
| the secinfo4 value <bcp14>SHOULD</bcp14> be the same across | ||||
| all data servers. If the secinfo4 values across a metadata server | ||||
| and its data servers differ for a specific file, the | ||||
| mapping of the principal to the server's internal user identifier | ||||
| <bcp14>MUST</bcp14> be the same in order for the access-control checks based on | ||||
| ACL, mode, open and deny mode, and mandatory locking to be | ||||
| consistent across on the pNFS server. | ||||
| </t> | ||||
| <t> | ||||
| If an NFSv4.1 implementation supports | ||||
| pNFS and supports NFSv4.1 file layouts, then the | ||||
| implementation <bcp14>MUST</bcp14> support the SECINFO_NO_NAME operation on both | ||||
| the metadata and data servers. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="internationalization" numbered="true" toc="default"> | ||||
| <name>Internationalization</name> | ||||
| <t> | ||||
| The primary issue in which NFSv4.1 needs to deal with | ||||
| internationalization, or I18N, is with respect to file names and other | ||||
| strings as used within the protocol. The choice of string | ||||
| representation must allow reasonable name/string access to clients | ||||
| that use various languages. The UTF-8 encoding of the UCS (Universal | ||||
| Multiple-Octet Coded Character Set) as defined | ||||
| by <xref target="ISO.10646-1.1993" format="default">ISO10646</xref> allows for this type | ||||
| of access and follows the policy described in "IETF Policy on | ||||
| Character Sets and Languages", <xref target="RFC2277" format="default">RFC 2277</xref>. | ||||
| </t> | ||||
| <t> | ||||
| <xref target="RFC3454" format="default">RFC 3454</xref>, otherwise known as "stringprep", documents a | ||||
| framework for using Unicode/UTF-8 in networking protocols so as "to | ||||
| increase the likelihood that string input and string comparison work | ||||
| in ways that make sense for typical users throughout the world". A | ||||
| protocol must define a profile of stringprep "in order to fully | ||||
| specify the processing options". The remainder of this | ||||
| section defines the NFSv4.1 stringprep profiles. Much of the terminology | ||||
| used for the remainder of this section comes from stringprep. | ||||
| </t> | ||||
| <t> | ||||
| There are three UTF-8 string types defined for NFSv4.1: | ||||
| utf8str_cs, utf8str_cis, and utf8str_mixed. Separate profiles are | ||||
| defined for each. Each profile defines the following, as required by | ||||
| stringprep: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The intended applicability of the profile. | ||||
| </li> | ||||
| <li> | ||||
| The character repertoire that is the input and output to stringprep | ||||
| (which is Unicode 3.2 for the referenced version of stringprep). | ||||
| However, NFSv4.1 implementations are not limited to 3.2. | ||||
| </li> | ||||
| <li> | ||||
| The mapping tables from stringprep used (as described in Section | ||||
| <xref target="RFC3454" sectionFormat="bare" section="3"/> of stringprep). | ||||
| </li> | ||||
| <li> | ||||
| Any additional mapping tables specific to the profile. | ||||
| </li> | ||||
| <li> | ||||
| The Unicode normalization used, if any (as described in Section | ||||
| <xref target="RFC3454" sectionFormat="bare" section="4"/> of stringprep). | ||||
| </li> | ||||
| <li> | ||||
| The tables from the stringprep listing of characters that are prohibited | ||||
| as output (as described in Section <xref target="RFC3454" sectionFormat="bare" section="5"/> of stringprep). | ||||
| </li> | ||||
| <li> | ||||
| The bidirectional string testing used, if any (as described in Section <xref target="RFC3454" sectionFormat="bare" section="6"/> of stringprep). | ||||
| </li> | ||||
| <li> | ||||
| Any additional characters that are prohibited as output specific to | ||||
| the profile. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Stringprep discusses Unicode characters, whereas NFSv4.1 renders | ||||
| UTF-8 characters. Since there is a one-to-one mapping from UTF-8 to | ||||
| Unicode, when the remainder of this document refers to Unicode, | ||||
| the reader should assume UTF-8. | ||||
| </t> | ||||
| <t> | ||||
| Much of the text for the profiles comes from RFC 3491 <xref target="RFC3491" format="default"/>. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Stringprep Profile for the utf8str_cs Type</name> | ||||
| <t> | ||||
| Every use of the utf8str_cs type definition in the NFSv4 protocol specification follows the profile named | ||||
| nfs4_cs_prep. | ||||
| </t> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Intended Applicability of the nfs4_cs_prep Profile</name> | ||||
| <t> | ||||
| The utf8str_cs type is a case-sensitive string of UTF-8 characters. | ||||
| Its primary use in NFSv4.1 is for naming components and | ||||
| pathnames. Components and pathnames are stored on the server's | ||||
| file system. Two valid distinct UTF-8 strings might be the same after | ||||
| processing via the utf8str_cs profile. If the strings are two names | ||||
| inside a directory, the NFSv4.1 server will need to either: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| disallow the creation of a second name if its post-processed form | ||||
| collides with that of an existing name, or | ||||
| </li> | ||||
| <li> | ||||
| allow the creation of the second name, but arrange so that after | ||||
| post-processing, the second name is different than the post-processed | ||||
| form of the first name. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Character Repertoire of nfs4_cs_prep</name> | ||||
| <t> | ||||
| The nfs4_cs_prep profile uses Unicode 3.2, as defined in stringprep's | ||||
| Appendix A.1. | ||||
| However, NFSv4.1 implementations are not limited to 3.2. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Mapping Used by nfs4_cs_prep</name> | ||||
| <t> | ||||
| The nfs4_cs_prep profile specifies mapping using the | ||||
| following tables from stringprep: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| Table B.1 | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Table B.2 is normally not part of the nfs4_cs_prep profile as it is | ||||
| primarily for dealing with case-insensitive comparisons. However, if | ||||
| the NFSv4.1 file server supports the case_insensitive file system | ||||
| attribute, and if case_insensitive is TRUE, the NFSv4.1 server | ||||
| <bcp14>MUST</bcp14> use Table B.2 (in addition to Table B1) when processing | ||||
| utf8str_cs strings, and the NFSv4.1 client <bcp14>MUST</bcp14> assume Table B.2 | ||||
| (in addition to Table B.1) is being used. | ||||
| </t> | ||||
| <t> | ||||
| If the case_preserving attribute is present and set to FALSE, then the | ||||
| NFSv4.1 server <bcp14>MUST</bcp14> use Table B.2 to map case when processing | ||||
| utf8str_cs strings. Whether the server maps from lower to upper case | ||||
| or from upper to lower case is an implementation dependency. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Normalization used by nfs4_cs_prep</name> | ||||
| <t> | ||||
| The nfs4_cs_prep profile does not specify a normalization form. A | ||||
| later revision of this specification may specify a particular | ||||
| normalization form. Therefore, the server and client can expect that | ||||
| they may receive unnormalized characters within protocol requests and | ||||
| responses. If the operating environment requires normalization, then | ||||
| the implementation must normalize utf8str_cs strings within the | ||||
| protocol before presenting the information to an application (at the | ||||
| client) or local file system (at the server). | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Prohibited Output for nfs4_cs_prep</name> | ||||
| <t> | ||||
| The nfs4_cs_prep profile RECOMMENDS prohibiting the use of the | ||||
| following tables from stringprep: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li>Table C.5</li> | ||||
| <li>Table C.6</li> | ||||
| </ul> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Bidirectional Output for nfs4_cs_prep</name> | ||||
| <t> | ||||
| The nfs4_cs_prep profile does not specify any checking of | ||||
| bidirectional strings. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Stringprep Profile for the utf8str_cis Type</name> | ||||
| <t> | ||||
| Every use of the utf8str_cis type definition in the NFSv4.1 | ||||
| protocol specification follows the profile named nfs4_cis_prep. | ||||
| </t> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Intended Applicability of the nfs4_cis_prep Profile</name> | ||||
| <t> | ||||
| The utf8str_cis type is a case-insensitive string of | ||||
| UTF-8 characters. Its primary use in NFSv4.1 is | ||||
| for naming NFS servers. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Character Repertoire of nfs4_cis_prep</name> | ||||
| <t> | ||||
| The nfs4_cis_prep profile uses Unicode 3.2, as defined in stringprep's | ||||
| Appendix A.1. However, NFSv4.1 implementations are not limited to 3.2. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Mapping Used by nfs4_cis_prep</name> | ||||
| <t> | ||||
| The nfs4_cis_prep profile specifies mapping using the following tables from | ||||
| stringprep: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li>Table B.1</li> | ||||
| <li>Table B.2</li> | ||||
| </ul> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Normalization Used by nfs4_cis_prep</name> | ||||
| <t> | ||||
| The nfs4_cis_prep profile specifies using Unicode normalization form | ||||
| KC, as described in stringprep. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Prohibited Output for nfs4_cis_prep</name> | ||||
| <t> | ||||
| The nfs4_cis_prep profile specifies prohibiting using the following | ||||
| tables from stringprep: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li>Table C.1.2</li> | ||||
| <li>Table C.2.2</li> | ||||
| <li>Table C.3</li> | ||||
| <li>Table C.4</li> | ||||
| <li>Table C.5</li> | ||||
| <li>Table C.6</li> | ||||
| <li>Table C.7</li> | ||||
| <li>Table C.8</li> | ||||
| <li>Table C.9</li> | ||||
| </ul> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Bidirectional Output for nfs4_cis_prep</name> | ||||
| <t> | ||||
| The nfs4_cis_prep profile specifies checking bidirectional strings as | ||||
| described in stringprep's Section <xref target="RFC3454" sectionFormat="bare" section="6"/>. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Stringprep Profile for the utf8str_mixed Type</name> | ||||
| <t> | ||||
| Every use of the utf8str_mixed type definition in the NFSv4.1 | ||||
| protocol specification follows the profile named nfs4_mixed_prep. | ||||
| </t> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Intended Applicability of the nfs4_mixed_prep Profile</name> | ||||
| <t> | ||||
| The utf8str_mixed type is a string of UTF-8 characters, with a prefix | ||||
| that is case sensitive, a separator equal to '@', and a suffix that is a | ||||
| fully qualified domain name. Its primary use in NFSv4.1 is for | ||||
| naming principals identified in an Access Control Entry. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Character Repertoire of nfs4_mixed_prep</name> | ||||
| <t> | ||||
| The nfs4_mixed_prep profile uses Unicode 3.2, as defined in | ||||
| stringprep's Appendix A.1. | ||||
| However, NFSv4.1 implementations are not limited to 3.2. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Mapping Used by nfs4_cis_prep</name> | ||||
| <t> | ||||
| For the prefix and the separator of a utf8str_mixed | ||||
| string, the nfs4_mixed_prep profile specifies mapping | ||||
| using the following table from stringprep: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li>Table B.1</li> | ||||
| </ul> | ||||
| <t> | ||||
| For the suffix of a utf8str_mixed string, the nfs4_mixed_prep | ||||
| profile specifies mapping using the following tables from | ||||
| stringprep: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li>Table B.1</li> | ||||
| <li>Table B.2</li> | ||||
| </ul> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Normalization Used by nfs4_mixed_prep</name> | ||||
| <t> | ||||
| The nfs4_mixed_prep profile specifies using Unicode normalization form | ||||
| KC, as described in stringprep. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Prohibited Output for nfs4_mixed_prep</name> | ||||
| <t> | ||||
| The nfs4_mixed_prep profile specifies prohibiting using the | ||||
| following tables from stringprep: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li>Table C.1.2</li> | ||||
| <li>Table C.2.2</li> | ||||
| <li>Table C.3</li> | ||||
| <li>Table C.4</li> | ||||
| <li>Table C.5</li> | ||||
| <li>Table C.6</li> | ||||
| <li>Table C.7</li> | ||||
| <li>Table C.8</li> | ||||
| <li>Table C.9</li> | ||||
| </ul> | ||||
| </section> | ||||
| <section toc="exclude" numbered="true"> | ||||
| <name>Bidirectional Output for nfs4_mixed_prep</name> | ||||
| <t> | ||||
| The nfs4_mixed_prep profile specifies checking bidirectional strings | ||||
| as described in stringprep's Section <xref target="RFC3454" sectionFormat="bare" section="6"/>. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="utf8_caps" numbered="true" toc="default"> | ||||
| <name>UTF-8 Capabilities</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const FSCHARSET_CAP4_CONTAINS_NON_UTF8 = 0x1; | ||||
| const FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 = 0x2; | ||||
| typedef uint32_t fs_charset_cap4;]]></sourcecode> | ||||
| <t> | ||||
| Because some operating environments and file systems do | ||||
| not enforce character set encodings, NFSv4.1 supports the | ||||
| fs_charset_cap attribute (<xref target="attrdef_fs_charset_cap" format="default"/>) | ||||
| that indicates to the client a file system's UTF-8 capabilities. | ||||
| The attribute is an integer containing a pair of flags. | ||||
| The first flag is FSCHARSET_CAP4_CONTAINS_NON_UTF8, which, if set | ||||
| to one, tells the client that the file system contains non-UTF-8 characters, | ||||
| and the server will not convert non-UTF characters to UTF-8 if the client | ||||
| reads a symbolic link or directory, neither will operations with component | ||||
| names or pathnames in the arguments convert the strings to UTF-8. | ||||
| The second flag is FSCHARSET_CAP4_ALLOWS_ONLY_UTF8, which, if set to | ||||
| one, indicates that the server will accept (and generate) only | ||||
| UTF-8 characters on the file system. If | ||||
| FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 is set to one, | ||||
| FSCHARSET_CAP4_CONTAINS_NON_UTF8 <bcp14>MUST</bcp14> be set to zero. | ||||
| FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 <bcp14>SHOULD</bcp14> always be set to one. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="utf8_related_errors" numbered="true" toc="default"> | ||||
| <name>UTF-8 Related Errors</name> | ||||
| <t> | ||||
| Where the client sends an invalid UTF-8 string, the server should | ||||
| return NFS4ERR_INVAL (see <xref target="error_definitions" format="default"/>). | ||||
| This includes cases in which inappropriate prefixes are detected and | ||||
| where the count includes trailing bytes that do not constitute a full | ||||
| UCS character. | ||||
| </t> | ||||
| <t> | ||||
| Where the client-supplied string is valid UTF-8 but contains | ||||
| characters that are not supported by the server as a value for that | ||||
| string (e.g., names containing characters outside of Unicode plane 0 on | ||||
| file systems that fail to support such characters despite their | ||||
| presence in the Unicode standard), the server should return | ||||
| NFS4ERR_BADCHAR. | ||||
| </t> | ||||
| <t> | ||||
| Where a UTF-8 string is used as a file name, and the file system (while | ||||
| supporting all of the characters within the name) does not allow that | ||||
| particular name to be used, the server should return the error <xref target="error_definitions" format="default">NFS4ERR_BADNAME</xref>. This includes | ||||
| situations in which the server file system imposes a normalization | ||||
| constraint on name strings, but will also include such situations as | ||||
| file system prohibitions of "." and ".." as file names for certain | ||||
| operations, and other such constraints. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Error Values</name> | ||||
| <t> | ||||
| NFS error numbers are assigned to failed operations within a | ||||
| Compound (COMPOUND or CB_COMPOUND) request. A Compound request | ||||
| contains a number of NFS operations that have their results | ||||
| encoded in sequence in a Compound reply. The results of successful | ||||
| operations will consist of an NFS4_OK status followed by the | ||||
| encoded results of the operation. If an NFS operation fails, an | ||||
| error status will be entered in the reply and the Compound | ||||
| request will be terminated. | ||||
| </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Error Definitions</name> | ||||
| <table anchor="error_definitions" align="center"> | ||||
| <name> Protocol Error Definitions</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Error</th> | ||||
| <th align="left">Number</th> | ||||
| <th align="left">Description</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">NFS4_OK</td> | ||||
| <td align="left">0</td> | ||||
| <td align="left"> | ||||
| <xref target="err_OK" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_ACCESS</td> | ||||
| <td align="left">13</td> | ||||
| <td align="left"> | ||||
| <xref target="err_ACCESS" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_ATTRNOTSUPP</td> | ||||
| <td align="left">10032</td> | ||||
| <td align="left"> | ||||
| <xref target="err_ATTRNOTSUPP" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_ADMIN_REVOKED</td> | ||||
| <td align="left">10047</td> | ||||
| <td align="left"> | ||||
| <xref target="err_ADMIN_REVOKED" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BACK_CHAN_BUSY</td> | ||||
| <td align="left">10057</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BACK_CHAN_BUSY" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADCHAR</td> | ||||
| <td align="left">10040</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BADCHAR" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADHANDLE</td> | ||||
| <td align="left">10001</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BADHANDLE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADIOMODE</td> | ||||
| <td align="left">10049</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BADIOMODE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADLAYOUT</td> | ||||
| <td align="left">10050</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BADLAYOUT" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADNAME</td> | ||||
| <td align="left">10041</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BADNAME" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADOWNER</td> | ||||
| <td align="left">10039</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BADOWNER" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADSESSION</td> | ||||
| <td align="left">10052</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BADSESSION" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADSLOT</td> | ||||
| <td align="left">10053</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BADSLOT" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADTYPE</td> | ||||
| <td align="left">10007</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BADTYPE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADXDR</td> | ||||
| <td align="left">10036</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BADXDR" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BAD_COOKIE</td> | ||||
| <td align="left">10003</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BAD_COOKIE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BAD_HIGH_SLOT</td> | ||||
| <td align="left">10077</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BAD_HIGH_SLOT" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BAD_RANGE</td> | ||||
| <td align="left">10042</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BAD_RANGE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BAD_SEQID</td> | ||||
| <td align="left">10026</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BAD_SEQID" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BAD_SESSION_DIGEST</td> | ||||
| <td align="left">10051</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BAD_SESSION_DIGEST" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BAD_STATEID</td> | ||||
| <td align="left">10025</td> | ||||
| <td align="left"> | ||||
| <xref target="err_BAD_STATEID" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_CB_PATH_DOWN</td> | ||||
| <td align="left">10048</td> | ||||
| <td align="left"> | ||||
| <xref target="err_CB_PATH_DOWN" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_CLID_INUSE</td> | ||||
| <td align="left">10017</td> | ||||
| <td align="left"> | ||||
| <xref target="err_CLID_INUSE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_CLIENTID_BUSY</td> | ||||
| <td align="left">10074</td> | ||||
| <td align="left"> | ||||
| <xref target="err_CLIENTID_BUSY" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_COMPLETE_ALREADY</td> | ||||
| <td align="left">10054</td> | ||||
| <td align="left"> | ||||
| <xref target="err_COMPLETE_ALREADY" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_CONN_NOT_BOUND_TO_SESSION</td> | ||||
| <td align="left">10055</td> | ||||
| <td align="left"> | ||||
| <xref target="err_CONN_NOT_BOUND_TO_SESSION" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DEADLOCK</td> | ||||
| <td align="left">10045</td> | ||||
| <td align="left"> | ||||
| <xref target="err_DEADLOCK" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DEADSESSION</td> | ||||
| <td align="left">10078</td> | ||||
| <td align="left"> | ||||
| <xref target="err_DEADSESSION" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DELAY</td> | ||||
| <td align="left">10008</td> | ||||
| <td align="left"> | ||||
| <xref target="err_DELAY" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DELEG_ALREADY_WANTED</td> | ||||
| <td align="left">10056</td> | ||||
| <td align="left"> | ||||
| <xref target="err_DELEG_ALREADY_WANTED" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DELEG_REVOKED</td> | ||||
| <td align="left">10087</td> | ||||
| <td align="left"> | ||||
| <xref target="err_DELEG_REVOKED" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DENIED</td> | ||||
| <td align="left">10010</td> | ||||
| <td align="left"> | ||||
| <xref target="err_DENIED" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DIRDELEG_UNAVAIL</td> | ||||
| <td align="left">10084</td> | ||||
| <td align="left"> | ||||
| <xref target="err_DIRDELEG_UNAVAIL" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DQUOT</td> | ||||
| <td align="left">69</td> | ||||
| <td align="left"> | ||||
| <xref target="err_DQUOT" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_ENCR_ALG_UNSUPP</td> | ||||
| <td align="left">10079</td> | ||||
| <td align="left"> | ||||
| <xref target="err_ENCR_ALG_UNSUPP" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_EXIST</td> | ||||
| <td align="left">17</td> | ||||
| <td align="left"> | ||||
| <xref target="err_EXIST" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_EXPIRED</td> | ||||
| <td align="left">10011</td> | ||||
| <td align="left"> | ||||
| <xref target="err_EXPIRED" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_FBIG</td> | ||||
| <td align="left">27</td> | ||||
| <td align="left"> | ||||
| <xref target="err_FBIG" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_FHEXPIRED</td> | ||||
| <td align="left">10014</td> | ||||
| <td align="left"> | ||||
| <xref target="err_FHEXPIRED" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_FILE_OPEN</td> | ||||
| <td align="left">10046</td> | ||||
| <td align="left"> | ||||
| <xref target="err_FILE_OPEN" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_GRACE</td> | ||||
| <td align="left">10013</td> | ||||
| <td align="left"> | ||||
| <xref target="err_GRACE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_HASH_ALG_UNSUPP</td> | ||||
| <td align="left">10072</td> | ||||
| <td align="left"> | ||||
| <xref target="err_HASH_ALG_UNSUPP" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_INVAL</td> | ||||
| <td align="left">22</td> | ||||
| <td align="left"> | ||||
| <xref target="err_INVAL" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_IO</td> | ||||
| <td align="left">5</td> | ||||
| <td align="left"> | ||||
| <xref target="err_IO" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_ISDIR</td> | ||||
| <td align="left">21</td> | ||||
| <td align="left"> | ||||
| <xref target="err_ISDIR" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LAYOUTTRYLATER</td> | ||||
| <td align="left">10058</td> | ||||
| <td align="left"> | ||||
| <xref target="err_LAYOUTTRYLATER" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LAYOUTUNAVAILABLE</td> | ||||
| <td align="left">10059</td> | ||||
| <td align="left"> | ||||
| <xref target="err_LAYOUTUNAVAILABLE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LEASE_MOVED</td> | ||||
| <td align="left">10031</td> | ||||
| <td align="left"> | ||||
| <xref target="err_LEASE_MOVED" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LOCKED</td> | ||||
| <td align="left">10012</td> | ||||
| <td align="left"> | ||||
| <xref target="err_LOCKED" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LOCKS_HELD</td> | ||||
| <td align="left">10037</td> | ||||
| <td align="left"> | ||||
| <xref target="err_LOCKS_HELD" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LOCK_NOTSUPP</td> | ||||
| <td align="left">10043</td> | ||||
| <td align="left"> | ||||
| <xref target="err_LOCK_NOTSUPP" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LOCK_RANGE</td> | ||||
| <td align="left">10028</td> | ||||
| <td align="left"> | ||||
| <xref target="err_LOCK_RANGE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_MINOR_VERS_MISMATCH</td> | ||||
| <td align="left">10021</td> | ||||
| <td align="left"> | ||||
| <xref target="err_MINOR_VERS_MISMATCH" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_MLINK</td> | ||||
| <td align="left">31</td> | ||||
| <td align="left"> | ||||
| <xref target="err_MLINK" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_MOVED</td> | ||||
| <td align="left">10019</td> | ||||
| <td align="left"> | ||||
| <xref target="err_MOVED" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NAMETOOLONG</td> | ||||
| <td align="left">63</td> | ||||
| <td align="left"> | ||||
| <xref target="err_NAMETOOLONG" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOENT</td> | ||||
| <td align="left">2</td> | ||||
| <td align="left"> | ||||
| <xref target="err_NOENT" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOFILEHANDLE</td> | ||||
| <td align="left">10020</td> | ||||
| <td align="left"> | ||||
| <xref target="err_NOFILEHANDLE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOMATCHING_LAYOUT</td> | ||||
| <td align="left">10060</td> | ||||
| <td align="left"> | ||||
| <xref target="err_NOMATCHING_LAYOUT" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOSPC</td> | ||||
| <td align="left">28</td> | ||||
| <td align="left"> | ||||
| <xref target="err_NOSPC" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOTDIR</td> | ||||
| <td align="left">20</td> | ||||
| <td align="left"> | ||||
| <xref target="err_NOTDIR" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOTEMPTY</td> | ||||
| <td align="left">66</td> | ||||
| <td align="left"> | ||||
| <xref target="err_NOTEMPTY" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOTSUPP</td> | ||||
| <td align="left">10004</td> | ||||
| <td align="left"> | ||||
| <xref target="err_NOTSUPP" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOT_ONLY_OP</td> | ||||
| <td align="left">10081</td> | ||||
| <td align="left"> | ||||
| <xref target="err_NOT_ONLY_OP" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOT_SAME</td> | ||||
| <td align="left">10027</td> | ||||
| <td align="left"> | ||||
| <xref target="err_NOT_SAME" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NO_GRACE</td> | ||||
| <td align="left">10033</td> | ||||
| <td align="left"> | ||||
| <xref target="err_NO_GRACE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NXIO</td> | ||||
| <td align="left">6</td> | ||||
| <td align="left"> | ||||
| <xref target="err_NXIO" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_OLD_STATEID</td> | ||||
| <td align="left">10024</td> | ||||
| <td align="left"> | ||||
| <xref target="err_OLD_STATEID" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_OPENMODE</td> | ||||
| <td align="left">10038</td> | ||||
| <td align="left"> | ||||
| <xref target="err_OPENMODE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_OP_ILLEGAL</td> | ||||
| <td align="left">10044</td> | ||||
| <td align="left"> | ||||
| <xref target="err_OP_ILLEGAL" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_OP_NOT_IN_SESSION</td> | ||||
| <td align="left">10071</td> | ||||
| <td align="left"> | ||||
| <xref target="err_OP_NOT_IN_SESSION" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_PERM</td> | ||||
| <td align="left">1</td> | ||||
| <td align="left"> | ||||
| <xref target="err_PERM" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_PNFS_IO_HOLE</td> | ||||
| <td align="left">10075</td> | ||||
| <td align="left"> | ||||
| <xref target="err_PNFS_IO_HOLE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_PNFS_NO_LAYOUT</td> | ||||
| <td align="left">10080</td> | ||||
| <td align="left"> | ||||
| <xref target="err_PNFS_NO_LAYOUT" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_RECALLCONFLICT</td> | ||||
| <td align="left">10061</td> | ||||
| <td align="left"> | ||||
| <xref target="err_RECALLCONFLICT" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_RECLAIM_BAD</td> | ||||
| <td align="left">10034</td> | ||||
| <td align="left"> | ||||
| <xref target="err_RECLAIM_BAD" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_RECLAIM_CONFLICT</td> | ||||
| <td align="left">10035</td> | ||||
| <td align="left"> | ||||
| <xref target="err_RECLAIM_CONFLICT" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REJECT_DELEG</td> | ||||
| <td align="left">10085</td> | ||||
| <td align="left"> | ||||
| <xref target="err_REJECT_DELEG" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REP_TOO_BIG</td> | ||||
| <td align="left">10066</td> | ||||
| <td align="left"> | ||||
| <xref target="err_REP_TOO_BIG" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REP_TOO_BIG_TO_CACHE</td> | ||||
| <td align="left">10067</td> | ||||
| <td align="left"> | ||||
| <xref target="err_REP_TOO_BIG_TO_CACHE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REQ_TOO_BIG</td> | ||||
| <td align="left">10065</td> | ||||
| <td align="left"> | ||||
| <xref target="err_REQ_TOO_BIG" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_RESTOREFH</td> | ||||
| <td align="left">10030</td> | ||||
| <td align="left"> | ||||
| <xref target="err_RESTOREFH" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_RETRY_UNCACHED_REP</td> | ||||
| <td align="left">10068</td> | ||||
| <td align="left"> | ||||
| <xref target="err_RETRY_UNCACHED_REP" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_RETURNCONFLICT</td> | ||||
| <td align="left">10086</td> | ||||
| <td align="left"> | ||||
| <xref target="err_RETURNCONFLICT" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_ROFS</td> | ||||
| <td align="left">30</td> | ||||
| <td align="left"> | ||||
| <xref target="err_ROFS" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SAME</td> | ||||
| <td align="left">10009</td> | ||||
| <td align="left"> | ||||
| <xref target="err_SAME" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SHARE_DENIED</td> | ||||
| <td align="left">10015</td> | ||||
| <td align="left"> | ||||
| <xref target="err_SHARE_DENIED" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SEQUENCE_POS</td> | ||||
| <td align="left">10064</td> | ||||
| <td align="left"> | ||||
| <xref target="err_SEQUENCE_POS" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SEQ_FALSE_RETRY</td> | ||||
| <td align="left">10076</td> | ||||
| <td align="left"> | ||||
| <xref target="err_SEQ_FALSE_RETRY" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SEQ_MISORDERED</td> | ||||
| <td align="left">10063</td> | ||||
| <td align="left"> | ||||
| <xref target="err_SEQ_MISORDERED" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SERVERFAULT</td> | ||||
| <td align="left">10006</td> | ||||
| <td align="left"> | ||||
| <xref target="err_SERVERFAULT" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_STALE</td> | ||||
| <td align="left">70</td> | ||||
| <td align="left"> | ||||
| <xref target="err_STALE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_STALE_CLIENTID</td> | ||||
| <td align="left">10022</td> | ||||
| <td align="left"> | ||||
| <xref target="err_STALE_CLIENTID" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_STALE_STATEID</td> | ||||
| <td align="left">10023</td> | ||||
| <td align="left"> | ||||
| <xref target="err_STALE_STATEID" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SYMLINK</td> | ||||
| <td align="left">10029</td> | ||||
| <td align="left"> | ||||
| <xref target="err_SYMLINK" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_TOOSMALL</td> | ||||
| <td align="left">10005</td> | ||||
| <td align="left"> | ||||
| <xref target="err_TOOSMALL" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_TOO_MANY_OPS</td> | ||||
| <td align="left">10070</td> | ||||
| <td align="left"> | ||||
| <xref target="err_TOO_MANY_OPS" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_UNKNOWN_LAYOUTTYPE</td> | ||||
| <td align="left">10062</td> | ||||
| <td align="left"> | ||||
| <xref target="err_UNKNOWN_LAYOUTTYPE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_UNSAFE_COMPOUND</td> | ||||
| <td align="left">10069</td> | ||||
| <td align="left"> | ||||
| <xref target="err_UNSAFE_COMPOUND" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_WRONGSEC</td> | ||||
| <td align="left">10016</td> | ||||
| <td align="left"> | ||||
| <xref target="err_WRONGSEC" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_WRONG_CRED</td> | ||||
| <td align="left">10082</td> | ||||
| <td align="left"> | ||||
| <xref target="err_WRONG_CRED" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_WRONG_TYPE</td> | ||||
| <td align="left">10083</td> | ||||
| <td align="left"> | ||||
| <xref target="err_WRONG_TYPE" format="default"/></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_XDEV</td> | ||||
| <td align="left">18</td> | ||||
| <td align="left"> | ||||
| <xref target="err_XDEV" format="default"/></td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <section anchor="errors_gen" numbered="true" toc="default"> | ||||
| <name>General Errors</name> | ||||
| <t> | ||||
| This section deals with errors that are applicable to a broad | ||||
| set of different purposes. | ||||
| </t> | ||||
| <section anchor="err_BADXDR" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BADXDR (Error Code 10036)</name> | ||||
| <t> | ||||
| The arguments for this operation do not match those specified in | ||||
| the XDR definition. This includes situations in which the | ||||
| request ends before all the arguments have been seen. Note | ||||
| that this error applies when fixed enumerations (these include | ||||
| booleans) have a value within the input stream that is not | ||||
| valid for the enum. A replier may pre-parse all operations for | ||||
| a Compound procedure before doing any operation execution | ||||
| and return RPC-level XDR errors in that case. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_BAD_COOKIE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BAD_COOKIE (Error Code 10003)</name> | ||||
| <t> | ||||
| Used for operations that provide a set of information indexed by | ||||
| some quantity provided by the client or cookie sent by the | ||||
| server for an earlier invocation. Where the value cannot | ||||
| be used for its intended purpose, this error results. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_DELAY" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_DELAY (Error Code 10008)</name> | ||||
| <t> | ||||
| For any of a number of reasons, the replier could not | ||||
| process this operation in what was deemed a reasonable | ||||
| time. The client should wait and then try the request | ||||
| with a new slot and sequence value. | ||||
| </t> | ||||
| <t> | ||||
| Some examples of scenarios that might lead to this situation: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| A server that supports hierarchical storage receives a | ||||
| request to process a file that had been migrated. | ||||
| </li> | ||||
| <li> | ||||
| An operation requires a delegation recall to proceed, | ||||
| but the need to wait for this delegation to be recalled | ||||
| and returned makes processing this request in a timely fashion impossible. | ||||
| </li> | ||||
| <li> | ||||
| A request is being performed on a session being migrated | ||||
| from another server as described in <xref target="SEC11-XS-session" format="default"/>, | ||||
| and the lack of full information about the | ||||
| state of the session on the source makes it impossible | ||||
| to process the request immediately. | ||||
| </li> | ||||
| </ul> | ||||
| <!-- [rfced] In Section 15.1.1.3, we're having difficulty parsing | ||||
| these sentences. Is this a response to a response, or a response | ||||
| to a response to a response? That is, are the errors found in | ||||
| responses, or are they found in responses to responses? | ||||
| Current: | ||||
| Because of the need to avoid spurious reissues of non-idempotent | ||||
| operations and to avoid acting in response to NFS4ERR_DELAY | ||||
| errors returned on responses returned from the replier's reply | ||||
| cache, integration with the session-provided reply cache is | ||||
| necessary. | ||||
| ... | ||||
| In this case, the replier MUST avoid returning a response | ||||
| containing NFS4ERR_DELAY as the response to SEQUENCE solely on | ||||
| the basis of its presence in the reply cache. | ||||
| --> | ||||
| <t> | ||||
| In such cases, returning the error NFS4ERR_DELAY allows | ||||
| necessary preparatory operations to proceed without | ||||
| holding up requester resources such as a session slot. | ||||
| After delaying for period of time, the client can | ||||
| then re-send the operation in question, often as part | ||||
| of a nearly identical request. Because of the need to avoid | ||||
| spurious reissues of non-idempotent operations and to avoid | ||||
| acting in response to NFS4ERR_DELAY errors returned on responses | ||||
| returned from the replier's reply cache, | ||||
| integration with the session-provided reply cache is necessary. | ||||
| There are a number of cases to deal with, each of which requires | ||||
| different sorts of handling by the requester and replier: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If NFS4ERR_DELAY is returned on a SEQUENCE operation, the | ||||
| request is retried in full with the SEQUENCE operation | ||||
| containing the same slot and sequence values. In this case, | ||||
| the replier <bcp14>MUST</bcp14> avoid returning a response | ||||
| containing NFS4ERR_DELAY as the response to SEQUENCE solely | ||||
| because an earlier instance of the same request returned that error | ||||
| and it was stored in the reply cache. If the replier did this, | ||||
| the retries would not be effective as there would be no | ||||
| opportunity for the replier to see whether the condition that | ||||
| generated the NFS4ERR_DELAY had been rectified during the | ||||
| interim between the original request and the retry. | ||||
| </li> | ||||
| <li> | ||||
| If NFS4ERR_DELAY is returned on an operation other than SEQUENCE | ||||
| that validly appears as the first operation of a request, the handling | ||||
| is similar. The request can be retried in full without modification. | ||||
| In this case as well, | ||||
| the replier <bcp14>MUST</bcp14> avoid returning a response containing | ||||
| NFS4ERR_DELAY as the response to an initial operation of a request | ||||
| solely on the basis | ||||
| of its presence in the reply cache. If the replier did this, | ||||
| the retries would not be effective as there would be no | ||||
| opportunity for the replier to see whether the condition that | ||||
| generated the NFS4ERR_DELAY had been rectified during the | ||||
| interim between the original request and the retry. | ||||
| </li> | ||||
| <li> | ||||
| If NFS4ERR_DELAY is returned on an operation other than the first | ||||
| in the request, the request when retried <bcp14>MUST</bcp14> contain a SEQUENCE | ||||
| operation that is different than the original one, with either | ||||
| the slot ID or the sequence value different from that in the original | ||||
| request. Because requesters do this, there is no need for the | ||||
| replier to take special care to avoid returning an | ||||
| NFS4ERR_DELAY error obtained from the reply cache. When no non-idempotent | ||||
| operations have been processed before the NFS4ERR_DELAY was returned, | ||||
| the requester should retry the request in full, with the only | ||||
| difference from the original request being the modification to the | ||||
| slot ID or sequence value in the reissued SEQUENCE operation. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| When NFS4ERR_DELAY is returned on an operation other than the first | ||||
| within a request and there has been a non-idempotent operation | ||||
| processed before the NFS4ERR_DELAY was returned, reissuing the request as is normally | ||||
| done would incorrectly cause the re-execution of the non-idempotent operation. | ||||
| </t> | ||||
| <t> | ||||
| To avoid this situation, the client should reissue the request without the | ||||
| non-idempotent operation. The request still must use a SEQUENCE | ||||
| operation with either a different slot ID or sequence value from | ||||
| the SEQUENCE in the original request. Because this is done, there | ||||
| is no way the replier could avoid spuriously re-executing the | ||||
| non-idempotent operation since the different SEQUENCE parameters | ||||
| prevent the requester from recognizing that the non-idempotent | ||||
| operation is being retried. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Note that without the ability to return NFS4ERR_DELAY and the | ||||
| requester's willingness to re-send when receiving it, deadlock might | ||||
| result. For example, if a recall is done, and if the delegation | ||||
| return or operations preparatory to delegation return are held up by | ||||
| other operations that need the delegation to be returned, | ||||
| session slots might not be available. The result could be | ||||
| deadlock. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_INVAL" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_INVAL (Error Code 22)</name> | ||||
| <t> | ||||
| The arguments for this operation are not valid for some reason, even | ||||
| though they do match those specified in the XDR definition for | ||||
| the request. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_NOTSUPP" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_NOTSUPP (Error Code 10004)</name> | ||||
| <t> | ||||
| Operation not supported, either because the operation is | ||||
| an <bcp14>OPTIONAL</bcp14> one and is not supported by this server or | ||||
| because the operation <bcp14>MUST NOT</bcp14> be implemented in | ||||
| the current minor version. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_SERVERFAULT" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_SERVERFAULT (Error Code 10006)</name> | ||||
| <t> | ||||
| An error occurred on the server that does not map to any of | ||||
| the specific legal NFSv4.1 protocol error values. The client | ||||
| should translate this into an appropriate error. UNIX clients | ||||
| may choose to translate this to EIO. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_TOOSMALL" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_TOOSMALL (Error Code 10005)</name> | ||||
| <t> | ||||
| Used where an operation returns a variable amount of data, | ||||
| with a limit specified by the client. Where the data | ||||
| returned cannot be fit within the limit specified by the | ||||
| client, this error results. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_fh" numbered="true" toc="default"> | ||||
| <name>Filehandle Errors</name> | ||||
| <t> | ||||
| These errors deal with the situation in which the current | ||||
| or saved filehandle, or the filehandle passed to PUTFH | ||||
| intended to become the current filehandle, is invalid | ||||
| in some way. This includes situations in which the | ||||
| filehandle is a valid filehandle in general but is not | ||||
| of the appropriate object type for the current operation. | ||||
| </t> | ||||
| <t> | ||||
| Where the error description indicates a problem with the | ||||
| current or saved filehandle, it is to be understood that | ||||
| filehandles are only checked for the condition if they | ||||
| are implicit arguments of the operation in question. | ||||
| </t> | ||||
| <section anchor="err_BADHANDLE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BADHANDLE (Error Code 10001)</name> | ||||
| <t> | ||||
| Illegal NFS filehandle for the current server. The current | ||||
| filehandle failed internal consistency checks. Once accepted | ||||
| as valid (by PUTFH), no subsequent status change can cause the | ||||
| filehandle to generate this error. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_FHEXPIRED" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_FHEXPIRED (Error Code 10014)</name> | ||||
| <t> | ||||
| A current or saved filehandle that is an argument to the | ||||
| current operation is volatile and has expired at the server. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_ISDIR" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_ISDIR (Error Code 21)</name> | ||||
| <t> | ||||
| The current or saved filehandle designates a directory | ||||
| when the current operation does not allow a directory to | ||||
| be accepted as the target of this operation. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_MOVED" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_MOVED (Error Code 10019)</name> | ||||
| <t> | ||||
| The file system that contains the current filehandle object | ||||
| is not present at the server or is not accessible with the | ||||
| network address used. It may have been made accessible on a different | ||||
| set of network addresses, relocated or | ||||
| migrated to another server, or it may have never been present. | ||||
| The client may obtain the new file system location by obtaining | ||||
| the fs_locations or fs_locations_info attribute for the | ||||
| current filehandle. For further discussion, refer to | ||||
| <xref target="presence_or_absence" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| As with the case of NFS4ERR_DELAY, it is possible that one or | ||||
| more non-idempotent operations may have been successfully executed | ||||
| within a COMPOUND before NFS4ERR_MOVED is returned. Because of | ||||
| this, once the new location is determined, the original request | ||||
| that received the NFS4ERR_MOVED should not be re-executed in full. | ||||
| Instead, the client should send a new COMPOUND with any successfully | ||||
| executed non-idempotent | ||||
| operations removed. When the client uses the same session for the | ||||
| new COMPOUND, its SEQUENCE operation should use a different slot ID or sequence. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_NOFILEHANDLE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_NOFILEHANDLE (Error Code 10020)</name> | ||||
| <t> | ||||
| The logical current or saved filehandle value is required by | ||||
| the current operation and is not set. | ||||
| This may be a result of a malformed COMPOUND | ||||
| operation (i.e., no PUTFH or PUTROOTFH before an operation that | ||||
| requires the current filehandle be set). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_NOTDIR" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_NOTDIR (Error Code 20)</name> | ||||
| <t> | ||||
| The current (or saved) filehandle designates an object that | ||||
| is not a directory for an operation in which a directory is | ||||
| required. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_STALE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_STALE (Error Code 70)</name> | ||||
| <t> | ||||
| The current or saved filehandle value designating an argument | ||||
| to the current operation is invalid. The file referred to by | ||||
| that filehandle no longer exists or access to it has been | ||||
| revoked. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_SYMLINK" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_SYMLINK (Error Code 10029)</name> | ||||
| <t> | ||||
| The current filehandle designates a symbolic link when the | ||||
| current operation does not allow a symbolic link as the | ||||
| target. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_WRONG_TYPE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_WRONG_TYPE (Error Code 10083)</name> | ||||
| <t> | ||||
| The current (or saved) filehandle designates an object that | ||||
| is of an invalid type for the current operation, and there is no | ||||
| more specific error (such as NFS4ERR_ISDIR or NFS4ERR_SYMLINK) | ||||
| that applies. Note that in NFSv4.0, such situations generally | ||||
| resulted in the less-specific error NFS4ERR_INVAL. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_comp" numbered="true" toc="default"> | ||||
| <name>Compound Structure Errors</name> | ||||
| <t> | ||||
| This section deals with errors that relate to the overall structure | ||||
| of a Compound request (by which we mean to include both | ||||
| COMPOUND and CB_COMPOUND), rather than to particular operations. | ||||
| </t> | ||||
| <t> | ||||
| There are a number of basic constraints on the operations that | ||||
| may appear in a Compound request. Sessions add to these basic | ||||
| constraints by requiring a Sequence operation (either SEQUENCE | ||||
| or CB_SEQUENCE) at the start of the Compound. | ||||
| </t> | ||||
| <section anchor="err_OK" numbered="true" toc="default"> | ||||
| <name>NFS_OK (Error code 0)</name> | ||||
| <t> | ||||
| Indicates the operation completed successfully, in that all | ||||
| of the constituent operations completed without error. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_MINOR_VERS_MISMATCH" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_MINOR_VERS_MISMATCH (Error code 10021)</name> | ||||
| <t> | ||||
| The minor version specified is not one that the current listener | ||||
| supports. This value is returned in the overall status for the | ||||
| Compound but is not associated with a specific operation since | ||||
| the results will specify a result count of zero. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_NOT_ONLY_OP" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_NOT_ONLY_OP (Error Code 10081)</name> | ||||
| <t> | ||||
| Certain operations, which are allowed to be executed outside | ||||
| of a session, <bcp14>MUST</bcp14> be the only operation within a Compound | ||||
| whenever the Compound does not start with a Sequence | ||||
| operation. This error results when that constraint is not met. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_OP_ILLEGAL" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_OP_ILLEGAL (Error Code 10044)</name> | ||||
| <t> | ||||
| The operation code is not a valid one for the current | ||||
| Compound procedure. The opcode | ||||
| in the result stream matched with this error is the | ||||
| ILLEGAL value, although the value that appears in the | ||||
| request stream may be different. Where an illegal | ||||
| value appears and the replier pre-parses all operations for | ||||
| a Compound procedure before doing any operation execution, | ||||
| an RPC-level XDR error may be returned. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_OP_NOT_IN_SESSION" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_OP_NOT_IN_SESSION (Error Code 10071)</name> | ||||
| <t> | ||||
| Most forward operations and all callback operations are only | ||||
| valid within the context of a session, so that the Compound | ||||
| request in question <bcp14>MUST</bcp14> begin with a Sequence operation. | ||||
| If an attempt is made to execute these operations outside | ||||
| the context of session, this error results. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_REP_TOO_BIG" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_REP_TOO_BIG (Error Code 10066)</name> | ||||
| <t> | ||||
| The reply to a Compound would exceed the | ||||
| channel's negotiated maximum response size. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_REP_TOO_BIG_TO_CACHE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_REP_TOO_BIG_TO_CACHE (Error Code 10067)</name> | ||||
| <t> | ||||
| The reply to a Compound would exceed the | ||||
| channel's negotiated maximum size for replies cached in the | ||||
| reply cache when the Sequence for the current request specifies | ||||
| that this request is to be cached. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_REQ_TOO_BIG" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_REQ_TOO_BIG (Error Code 10065)</name> | ||||
| <t> | ||||
| The Compound request exceeds the | ||||
| channel's negotiated maximum size for requests. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_RETRY_UNCACHED_REP" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_RETRY_UNCACHED_REP (Error Code 10068)</name> | ||||
| <t> | ||||
| The requester has attempted a retry of a Compound | ||||
| that it previously requested not | ||||
| be placed in the reply cache. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_SEQUENCE_POS" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_SEQUENCE_POS (Error Code 10064)</name> | ||||
| <t> | ||||
| A Sequence operation appeared in a | ||||
| position other than the first operation of a | ||||
| Compound request. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_TOO_MANY_OPS" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_TOO_MANY_OPS (Error Code 10070)</name> | ||||
| <t> | ||||
| The Compound request has too many operations, exceeding the | ||||
| count negotiated when the session was created. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_UNSAFE_COMPOUND" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_UNSAFE_COMPOUND (Error Code 10068)</name> | ||||
| <t> | ||||
| The client has sent a COMPOUND request with an unsafe | ||||
| mix of operations -- specifically, with a non-idempotent | ||||
| operation that changes the current filehandle and that is not followed by a | ||||
| GETFH. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_fs" numbered="true" toc="default"> | ||||
| <name>File System Errors</name> | ||||
| <t> | ||||
| These errors describe situations that occurred in the underlying | ||||
| file system implementation rather than in the protocol or any | ||||
| NFSv4.x feature. | ||||
| </t> | ||||
| <section anchor="err_BADTYPE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BADTYPE (Error Code 10007)</name> | ||||
| <t> | ||||
| An attempt was made to create an object with an inappropriate | ||||
| type specified to CREATE. This may be because the type | ||||
| is undefined, because the type is not supported by the | ||||
| server, or because the type is not intended to be created by CREATE | ||||
| (such as a regular file or named attribute, for | ||||
| which OPEN is used to do the file creation). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_DQUOT" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_DQUOT (Error Code 19)</name> | ||||
| <t> | ||||
| Resource (quota) hard limit exceeded. The user's resource | ||||
| limit on the server has been exceeded. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_EXIST" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_EXIST (Error Code 17)</name> | ||||
| <t> | ||||
| A file of the specified target name (when creating, renaming, | ||||
| or linking) already exists. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_FBIG" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_FBIG (Error Code 27)</name> | ||||
| <t> | ||||
| The file is too large. The operation would have caused the file to | ||||
| grow beyond the server's limit. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_FILE_OPEN" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_FILE_OPEN (Error Code 10046)</name> | ||||
| <t> | ||||
| The operation is not allowed because a | ||||
| file involved in the operation is currently open. | ||||
| Servers may, but are not required to, disallow linking-to, | ||||
| removing, or renaming open files. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_IO" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_IO (Error Code 5)</name> | ||||
| <t> | ||||
| Indicates that an I/O error occurred for which the file system | ||||
| was unable to provide recovery. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_MLINK" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_MLINK (Error Code 31)</name> | ||||
| <t> | ||||
| The request would have caused the server's limit for the | ||||
| number of hard links a file may have to be exceeded. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_NOENT" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_NOENT (Error Code 2)</name> | ||||
| <t> | ||||
| Indicates no such file or directory. The file or directory name | ||||
| specified does not exist. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_NOSPC" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_NOSPC (Error Code 28)</name> | ||||
| <t> | ||||
| Indicates there is no space left on the device. The operation would have | ||||
| caused the server's file system to exceed its limit. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_NOTEMPTY" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_NOTEMPTY (Error Code 66)</name> | ||||
| <t> | ||||
| An attempt was made to remove a directory that was not | ||||
| empty. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_ROFS" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_ROFS (Error Code 30)</name> | ||||
| <t> | ||||
| Indicates a read-only file system. A modifying operation was | ||||
| attempted on a read-only file system. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_XDEV" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_XDEV (Error Code 18)</name> | ||||
| <t> | ||||
| Indicates an attempt to do an operation, such as linking, that | ||||
| inappropriately crosses a boundary. This may be due to such | ||||
| boundaries as: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| that between file systems (where the fsids are different). | ||||
| </li> | ||||
| <li> | ||||
| that between different named attribute directories or | ||||
| between a named attribute directory and an ordinary | ||||
| directory. | ||||
| </li> | ||||
| <li> | ||||
| that between byte-ranges of a file system that the file system | ||||
| implementation treats as separate (for example, for space | ||||
| accounting purposes), and where cross-connection between | ||||
| the byte-ranges are not allowed. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_state_mgt" numbered="true" toc="default"> | ||||
| <name>State Management Errors</name> | ||||
| <t> | ||||
| These errors indicate problems with the stateid (or one of | ||||
| the stateids) passed to a given operation. | ||||
| This includes | ||||
| situations in which the stateid is invalid as well as | ||||
| situations in which the stateid is valid but designates | ||||
| locking state that has been revoked. | ||||
| Depending on the operation, the | ||||
| stateid when valid may designate opens, byte-range locks, | ||||
| file or directory delegations, layouts, or device maps. | ||||
| </t> | ||||
| <section anchor="err_ADMIN_REVOKED" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_ADMIN_REVOKED (Error Code 10047)</name> | ||||
| <t> | ||||
| A stateid designates locking state of any type that has | ||||
| been revoked due to administrative interaction, possibly | ||||
| while the lease is valid. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_BAD_STATEID" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BAD_STATEID (Error Code 10026)</name> | ||||
| <t> | ||||
| A stateid does not properly designate any valid | ||||
| state. See Sections <xref target="stateid_lifetime" format="counter"/> and | ||||
| <xref target="special_stateid" format="counter"/> | ||||
| for a discussion of how stateids are validated. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_DELEG_REVOKED" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_DELEG_REVOKED (Error Code 10087)</name> | ||||
| <t> | ||||
| A stateid designates recallable locking state of | ||||
| any type (delegation or layout) that has been | ||||
| revoked due to the failure of the client to return | ||||
| the lock when it was recalled. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_EXPIRED" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_EXPIRED (Error Code 10011)</name> | ||||
| <t> | ||||
| A stateid designates locking state of any type that has | ||||
| been revoked due to expiration of the client's lease, | ||||
| either immediately upon lease expiration, or following | ||||
| a later request for a conflicting lock. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_OLD_STATEID" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_OLD_STATEID (Error Code 10024)</name> | ||||
| <t> | ||||
| A stateid with a non-zero seqid value does match | ||||
| the current seqid for the state designated by the | ||||
| user. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_sec" numbered="true" toc="default"> | ||||
| <name>Security Errors</name> | ||||
| <t> | ||||
| These are the various permission-related errors in NFSv4.1. | ||||
| </t> | ||||
| <section anchor="err_ACCESS" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_ACCESS (Error Code 13)</name> | ||||
| <t> | ||||
| Indicates permission denied. The caller does | ||||
| not have the correct permission to perform | ||||
| the requested operation. Contrast this with | ||||
| NFS4ERR_PERM (<xref target="err_PERM" format="default"/>), which | ||||
| restricts itself to owner or privileged-user | ||||
| permission failures, and NFS4ERR_WRONG_CRED | ||||
| (<xref target="err_WRONG_CRED" format="default"/>), which deals | ||||
| with appropriate permission to delete or modify | ||||
| transient objects based on the credentials of | ||||
| the user that created them. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_PERM" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_PERM (Error Code 1)</name> | ||||
| <t> | ||||
| Indicates requester is not the owner. The operation was not | ||||
| allowed because the caller is neither a privileged user | ||||
| (root) nor the owner of the target of the operation. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_WRONGSEC" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_WRONGSEC (Error Code 10016)</name> | ||||
| <t> | ||||
| Indicates that the security mechanism being used by the client | ||||
| for the operation does not match the server's security policy. | ||||
| The client should change the security mechanism being used and | ||||
| re-send the operation (but not with the same slot ID and | ||||
| sequence ID; one or both <bcp14>MUST</bcp14> be different on the re-send). SECINFO and SECINFO_NO_NAME can be used | ||||
| to determine the appropriate mechanism. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_WRONG_CRED" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_WRONG_CRED (Error Code 10082)</name> | ||||
| <t> | ||||
| An operation that manipulates state was attempted by a principal | ||||
| that was not allowed to modify that piece of state. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_name" numbered="true" toc="default"> | ||||
| <name>Name Errors</name> | ||||
| <t> | ||||
| Names in NFSv4 are UTF-8 strings. When the strings are not | ||||
| valid UTF-8 or are of length zero, the error NFS4ERR_INVAL | ||||
| results. Besides this, there are a number of other errors | ||||
| to indicate specific problems with names. | ||||
| </t> | ||||
| <section anchor="err_BADCHAR" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BADCHAR (Error Code 10040)</name> | ||||
| <t> | ||||
| A UTF-8 string contains a character that is not supported | ||||
| by the server in the context in which it being used. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_BADNAME" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BADNAME (Error Code 10041)</name> | ||||
| <t> | ||||
| A name string in a request consisted of valid UTF-8 | ||||
| characters supported by the server, but the name is not | ||||
| supported by the server as a valid name for the current operation. | ||||
| An example might be creating a file or directory named ".." | ||||
| on a server whose file system uses that name for links to | ||||
| parent directories. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_NAMETOOLONG" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_NAMETOOLONG (Error Code 63)</name> | ||||
| <t> | ||||
| Returned when the filename in an operation exceeds the | ||||
| server's implementation limit. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_locking" numbered="true" toc="default"> | ||||
| <name>Locking Errors</name> | ||||
| <t> | ||||
| This section deals with errors related to locking, both as to | ||||
| share reservations and byte-range locking. It does not deal | ||||
| with errors specific to the process of reclaiming locks. Those | ||||
| are dealt with in <xref target="errors_reclaim" format="default"/>. | ||||
| </t> | ||||
| <section anchor="err_BAD_RANGE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BAD_RANGE (Error Code 10042)</name> | ||||
| <t> | ||||
| The byte-range of a LOCK, LOCKT, or LOCKU operation is | ||||
| not allowed by the | ||||
| server. For example, this error results when a server | ||||
| that only supports 32-bit ranges receives a range that | ||||
| cannot be handled by that server. (See | ||||
| <xref target="OP_LOCK_DESCRIPTION" format="default"/>.) | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_DEADLOCK" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_DEADLOCK (Error Code 10045)</name> | ||||
| <t> | ||||
| The server has been able to determine a byte-range locking | ||||
| deadlock condition for a READW_LT or WRITEW_LT LOCK operation. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_DENIED" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_DENIED (Error Code 10010)</name> | ||||
| <t> | ||||
| An attempt to lock a file is denied. Since this may be a | ||||
| temporary condition, the client is encouraged to re-send the lock | ||||
| request (but not with the same slot ID and | ||||
| sequence ID; one or both <bcp14>MUST</bcp14> be different on the re-send) until the lock is accepted. See | ||||
| <xref target="blocking_locks" format="default"/> for a discussion of the re-send. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_LOCKED" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_LOCKED (Error Code 10012)</name> | ||||
| <t> | ||||
| A READ or WRITE operation was attempted on a file where there | ||||
| was a conflict between the I/O and an existing lock: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| There is a share reservation inconsistent with the I/O | ||||
| being done. | ||||
| </li> | ||||
| <li> | ||||
| The range to be read or written intersects an existing | ||||
| mandatory byte-range lock. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="err_LOCKS_HELD" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_LOCKS_HELD (Error Code 10037)</name> | ||||
| <t> | ||||
| An operation was prevented by the unexpected presence of locks. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_LOCK_NOTSUPP" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_LOCK_NOTSUPP (Error Code 10043)</name> | ||||
| <t> | ||||
| A LOCK operation was attempted that would require the upgrade | ||||
| or downgrade of a byte-range lock range already held by the owner, and the | ||||
| server does not support atomic upgrade or downgrade of locks. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_LOCK_RANGE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_LOCK_RANGE (Error Code 10028)</name> | ||||
| <t> | ||||
| A LOCK operation is operating on a range that overlaps in part a | ||||
| currently held byte-range lock for the current lock-owner and does not | ||||
| precisely match a single such byte-range lock where the server | ||||
| does not support this type of request, and thus does not | ||||
| implement POSIX locking semantics <xref target="fcntl" format="default"/>. See Sections | ||||
| <xref target="OP_LOCK_IMPLEMENTATION" format="counter"/>, | ||||
| <xref target="OP_LOCKT_IMPLEMENTATION" format="counter"/>, and | ||||
| <xref target="OP_LOCKU_IMPLEMENTATION" format="counter"/> for a discussion of | ||||
| how this applies to LOCK, LOCKT, and LOCKU respectively. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_OPENMODE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_OPENMODE (Error Code 10038)</name> | ||||
| <t> | ||||
| The client attempted a READ, WRITE, LOCK, or other operation | ||||
| not sanctioned by the stateid passed (e.g., writing to a file | ||||
| opened for read-only access). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_SHARE_DENIED" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_SHARE_DENIED (Error Code 10015)</name> | ||||
| <t> | ||||
| An attempt to OPEN a file with a share reservation has failed | ||||
| because of a share conflict. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_reclaim" numbered="true" toc="default"> | ||||
| <name>Reclaim Errors</name> | ||||
| <t> | ||||
| These errors relate to the process of reclaiming locks after a | ||||
| server restart. | ||||
| </t> | ||||
| <section anchor="err_COMPLETE_ALREADY" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_COMPLETE_ALREADY (Error Code 10054)</name> | ||||
| <t> | ||||
| The client previously sent a successful RECLAIM_COMPLETE | ||||
| operation specifying the same scope, whether that scope is global | ||||
| or for the same file system in the case of a per-fs RECLAIM_COMPLETE. | ||||
| An additional RECLAIM_COMPLETE operation is not necessary and results in this error. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_GRACE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_GRACE (Error Code 10013)</name> | ||||
| <t> | ||||
| This error is returned when the server is in its | ||||
| grace period with regard to the file system object for which | ||||
| the lock was requested. In this situation, a non-reclaim | ||||
| locking request cannot be granted. This can occur because either: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The server does not have sufficient information about locks that | ||||
| might be potentially reclaimed to determine whether the lock could | ||||
| be granted. | ||||
| </li> | ||||
| <li> | ||||
| The request is made by a client responsible for reclaiming its | ||||
| locks that has not yet done the appropriate RECLAIM_COMPLETE | ||||
| operation, allowing it to proceed to obtain new locks. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| In the case of a per-fs grace period, | ||||
| there may be clients (i.e., those currently using the destination | ||||
| file system) who might be unaware of the circumstances resulting | ||||
| in the initiation of the grace period. Such clients need to | ||||
| periodically retry the request until the grace period is over, just as | ||||
| other clients do. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_NO_GRACE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_NO_GRACE (Error Code 10033)</name> | ||||
| <t> | ||||
| A reclaim of client state was attempted in circumstances in | ||||
| which the server cannot guarantee that conflicting state has | ||||
| not been provided to another client. This occurs in any of the | ||||
| following situations: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| There | ||||
| is no active grace period applying to the file system object | ||||
| for which the request was made. | ||||
| </li> | ||||
| <li> | ||||
| The client making the | ||||
| request has no current role in reclaiming locks. | ||||
| </li> | ||||
| <li> | ||||
| Previous operations have created a situation in which | ||||
| the server is not able to determine that a reclaim-interfering | ||||
| edge condition does not exist. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="err_RECLAIM_BAD" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_RECLAIM_BAD (Error Code 10034)</name> | ||||
| <t> | ||||
| The server has determined that a reclaim attempted by the client | ||||
| is not valid, i.e., the lock specified as being reclaimed could | ||||
| not possibly have existed before the server restart or file | ||||
| system migration event. A server | ||||
| is not obliged to make this determination and will typically rely | ||||
| on the client to only reclaim locks that the client was granted prior | ||||
| to restart. However, | ||||
| when a server does have reliable information to enable it to make | ||||
| this determination, this error indicates that the reclaim has | ||||
| been rejected as invalid. This is as opposed to the error | ||||
| NFS4ERR_RECLAIM_CONFLICT (see <xref target="err_RECLAIM_CONFLICT" format="default"/>) | ||||
| where the server can only determine that | ||||
| there has been an invalid reclaim, but cannot determine | ||||
| which request is invalid. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_RECLAIM_CONFLICT" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_RECLAIM_CONFLICT (Error Code 10035)</name> | ||||
| <t> | ||||
| The reclaim attempted by the client has encountered a conflict | ||||
| and cannot be satisfied. This potentially indicates a misbehaving | ||||
| client, although not necessarily the one receiving the error. | ||||
| The misbehavior might be on the part of the client that | ||||
| established the lock with which this client conflicted. See also | ||||
| <xref target="err_RECLAIM_BAD" format="default"/> for the related error, | ||||
| NFS4ERR_RECLAIM_BAD. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_pnfs" numbered="true" toc="default"> | ||||
| <name>pNFS Errors</name> | ||||
| <t> | ||||
| This section deals with pNFS-related errors including those | ||||
| that are associated with using NFSv4.1 to communicate with a | ||||
| data server. | ||||
| </t> | ||||
| <section anchor="err_BADIOMODE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BADIOMODE (Error Code 10049)</name> | ||||
| <t> | ||||
| An invalid or inappropriate layout iomode was specified. | ||||
| For example an inappropriate layout iomode, suppose | ||||
| a client's LAYOUTGET operation specified an iomode of | ||||
| LAYOUTIOMODE4_RW, and the server is neither able nor willing | ||||
| to let the client send write requests to data servers; the server | ||||
| can reply with NFS4ERR_BADIOMODE. The client would then | ||||
| send another LAYOUTGET with an iomode of LAYOUTIOMODE4_READ. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_BADLAYOUT" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BADLAYOUT (Error Code 10050)</name> | ||||
| <t> | ||||
| The layout specified is invalid in some way. For LAYOUTCOMMIT, | ||||
| this indicates that the specified layout is not held by the | ||||
| client or is not of mode LAYOUTIOMODE4_RW. For LAYOUTGET, | ||||
| it indicates that a layout matching the client's specification | ||||
| as to minimum length cannot be granted. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_LAYOUTTRYLATER" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_LAYOUTTRYLATER (Error Code 10058)</name> | ||||
| <t> | ||||
| Layouts are temporarily unavailable for the file. The client | ||||
| should re-send later (but not with the same slot ID and | ||||
| sequence ID; one or both <bcp14>MUST</bcp14> be different on the re-send). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_LAYOUTUNAVAILABLE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_LAYOUTUNAVAILABLE (Error Code 10059)</name> | ||||
| <t> | ||||
| Returned when layouts are not available for the current file | ||||
| system or the particular specified file. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_NOMATCHING_LAYOUT" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_NOMATCHING_LAYOUT (Error Code 10060)</name> | ||||
| <t> | ||||
| Returned when layouts are recalled and the client has no layouts | ||||
| matching the specification of the layouts being recalled. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_PNFS_IO_HOLE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_PNFS_IO_HOLE (Error Code 10075)</name> | ||||
| <t> | ||||
| The pNFS client has attempted to read from or write to an | ||||
| illegal hole of a file of a data server that is using | ||||
| sparse packing. See <xref target="sparse_dense" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_PNFS_NO_LAYOUT" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_PNFS_NO_LAYOUT (Error Code 10080)</name> | ||||
| <t> | ||||
| The pNFS client has attempted to read from or write to a file | ||||
| (using a request to a data server) without holding a valid | ||||
| layout. This includes the case where the client had a layout, | ||||
| but the iomode does not allow a WRITE. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_RETURNCONFLICT" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_RETURNCONFLICT (Error Code 10086)</name> | ||||
| <t> | ||||
| A layout | ||||
| is unavailable due to an attempt to perform the LAYOUTGET | ||||
| before a pending LAYOUTRETURN on the file has been received. | ||||
| See <xref target="layout_server_consider" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_UNKNOWN_LAYOUTTYPE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_UNKNOWN_LAYOUTTYPE (Error Code 10062)</name> | ||||
| <t> | ||||
| The client has specified a layout type that is not supported by | ||||
| the server. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_sess_use" numbered="true" toc="default"> | ||||
| <name>Session Use Errors</name> | ||||
| <t> | ||||
| This section deals with errors encountered when using sessions, | ||||
| that is, errors encountered when a request uses a Sequence | ||||
| (i.e., either SEQUENCE or CB_SEQUENCE) operation. | ||||
| </t> | ||||
| <section anchor="err_BADSESSION" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BADSESSION (Error Code 10052)</name> | ||||
| <t> | ||||
| The specified session ID is unknown to the server | ||||
| to which the operation is addressed. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_BADSLOT" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BADSLOT (Error Code 10053)</name> | ||||
| <t> | ||||
| The requester sent a Sequence operation | ||||
| that attempted to use a slot the replier | ||||
| does not have in its slot table. It is possible the | ||||
| slot may have been retired. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_BAD_HIGH_SLOT" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BAD_HIGH_SLOT (Error Code 10077)</name> | ||||
| <t> | ||||
| The highest_slot argument in a Sequence operation | ||||
| exceeds the replier's enforced highest_slotid. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_CB_PATH_DOWN" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_CB_PATH_DOWN (Error Code 10048)</name> | ||||
| <t> | ||||
| There is a problem contacting the client via | ||||
| the callback path. The function of this error has | ||||
| been mostly superseded by the use of | ||||
| status flags in the reply to the SEQUENCE | ||||
| operation (see <xref target="OP_SEQUENCE" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_DEADSESSION" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_DEADSESSION (Error Code 10078)</name> | ||||
| <t> | ||||
| The specified session is a persistent session that is | ||||
| dead and does not accept new | ||||
| requests or perform new operations on existing requests | ||||
| (in the case in which a request was partially executed | ||||
| before server restart). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_CONN_NOT_BOUND_TO_SESSION" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_CONN_NOT_BOUND_TO_SESSION (Error Code 10055)</name> | ||||
| <t> | ||||
| A Sequence operation was sent on a connection that has not | ||||
| been associated with the specified session, | ||||
| where the client specified that connection association | ||||
| was to be enforced with SP4_MACH_CRED or SP4_SSV state protection. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_SEQ_FALSE_RETRY" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_SEQ_FALSE_RETRY (Error Code 10076)</name> | ||||
| <t> | ||||
| The requester sent a Sequence operation with a | ||||
| slot ID and sequence ID that are in the reply cache, but | ||||
| the replier has detected that the retried request | ||||
| is not the same as the original request. | ||||
| See <xref target="false_retry" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_SEQ_MISORDERED" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_SEQ_MISORDERED (Error Code 10063)</name> | ||||
| <t> | ||||
| The requester sent a Sequence operation | ||||
| with an invalid sequence ID. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_sess_mgt" numbered="true" toc="default"> | ||||
| <name>Session Management Errors</name> | ||||
| <t> | ||||
| This section deals with errors associated with requests used | ||||
| in session management. | ||||
| </t> | ||||
| <section anchor="err_BACK_CHAN_BUSY" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BACK_CHAN_BUSY (Error Code 10057)</name> | ||||
| <t> | ||||
| An attempt was made to destroy a session when the session | ||||
| cannot be destroyed because the server has | ||||
| callback requests outstanding. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_BAD_SESSION_DIGEST" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BAD_SESSION_DIGEST (Error Code 10051)</name> | ||||
| <t> | ||||
| The digest used in a SET_SSV request is not valid. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_client_mgt" numbered="true" toc="default"> | ||||
| <name>Client Management Errors</name> | ||||
| <t> | ||||
| This section deals with errors associated with requests used | ||||
| to create and manage client IDs. | ||||
| </t> | ||||
| <section anchor="err_CLIENTID_BUSY" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_CLIENTID_BUSY (Error Code 10074)</name> | ||||
| <t> | ||||
| The DESTROY_CLIENTID operation has found there are | ||||
| sessions and/or unexpired state associated with the | ||||
| client ID to be destroyed. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_CLID_INUSE" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_CLID_INUSE (Error Code 10017)</name> | ||||
| <t> | ||||
| While processing an EXCHANGE_ID operation, the server was presented | ||||
| with a co_ownerid field that matches an existing client with | ||||
| valid leased state, but the principal sending the EXCHANGE_ID | ||||
| operation differs from the principal that established the existing | ||||
| client. | ||||
| This indicates a collision (most likely due to chance) between | ||||
| clients. The client should recover by changing the | ||||
| co_ownerid and re-sending EXCHANGE_ID (but not with the same slot ID and | ||||
| sequence ID; one or both <bcp14>MUST</bcp14> be different on the re-send). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_ENCR_ALG_UNSUPP" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_ENCR_ALG_UNSUPP (Error Code 10079)</name> | ||||
| <t> | ||||
| An EXCHANGE_ID was sent that specified state protection | ||||
| via SSV, and where the set of encryption algorithms presented | ||||
| by the client did not include any supported by the server. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_HASH_ALG_UNSUPP" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_HASH_ALG_UNSUPP (Error Code 10072)</name> | ||||
| <t> | ||||
| An EXCHANGE_ID was sent that specified state protection | ||||
| via SSV, and where the set of hashing algorithms presented | ||||
| by the client did not include any supported by the server. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_STALE_CLIENTID" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_STALE_CLIENTID (Error Code 10022)</name> | ||||
| <t> | ||||
| A client ID not recognized by the server was passed to an | ||||
| operation. Note that unlike the case of NFSv4.0, client IDs | ||||
| are not passed explicitly to the server in ordinary locking | ||||
| operations and cannot result in this error. Instead, when | ||||
| there is a server restart, it is first manifested through | ||||
| an error on the associated session, and the staleness of the | ||||
| client ID is detected when trying to associate a client ID | ||||
| with a new session. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_deleg" numbered="true" toc="default"> | ||||
| <name>Delegation Errors</name> | ||||
| <t> | ||||
| This section deals with errors associated with requesting and | ||||
| returning delegations. | ||||
| </t> | ||||
| <section anchor="err_DELEG_ALREADY_WANTED" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_DELEG_ALREADY_WANTED (Error Code 10056)</name> | ||||
| <t> | ||||
| The client has requested a delegation when it had already | ||||
| registered that it wants that same delegation. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_DIRDELEG_UNAVAIL" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_DIRDELEG_UNAVAIL (Error Code 10084)</name> | ||||
| <t> | ||||
| This error is returned when the server is unable or unwilling | ||||
| to provide a requested directory delegation. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_RECALLCONFLICT" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_RECALLCONFLICT (Error Code 10061)</name> | ||||
| <t> | ||||
| A recallable object (i.e., a layout or delegation) | ||||
| is unavailable due to a conflicting recall operation that is | ||||
| currently in progress for that object. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_REJECT_DELEG" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_REJECT_DELEG (Error Code 10085)</name> | ||||
| <t> | ||||
| The callback operation invoked to deal with a new delegation has | ||||
| rejected it. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_attr" numbered="true" toc="default"> | ||||
| <name>Attribute Handling Errors</name> | ||||
| <t> | ||||
| This section deals with errors specific to attribute handling | ||||
| within NFSv4. | ||||
| </t> | ||||
| <section anchor="err_ATTRNOTSUPP" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_ATTRNOTSUPP (Error Code 10032)</name> | ||||
| <t> | ||||
| An attribute specified is not supported by the server. This | ||||
| error <bcp14>MUST NOT</bcp14> be returned by the GETATTR operation. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_BADOWNER" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BADOWNER (Error Code 10039)</name> | ||||
| <t> | ||||
| This error is returned when an owner or owner_group attribute value or the who | ||||
| field of an ACE within an ACL attribute value cannot be | ||||
| translated to a local representation. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_NOT_SAME" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_NOT_SAME (Error Code 10027)</name> | ||||
| <t> | ||||
| This error is returned by the VERIFY operation to signify | ||||
| that the attributes compared were not the same as those provided | ||||
| in the client's request. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_SAME" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_SAME (Error Code 10009)</name> | ||||
| <t> | ||||
| This error is returned by the NVERIFY operation to signify | ||||
| that the attributes compared were the same as those provided | ||||
| in the client's request. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="errors_obs" numbered="true" toc="default"> | ||||
| <name>Obsoleted Errors</name> | ||||
| <t> | ||||
| These errors <bcp14>MUST NOT</bcp14> be generated by any NFSv4.1 operation. | ||||
| This can be for a number of reasons. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The function provided by the error has been superseded | ||||
| by one of the status bits returned by the SEQUENCE | ||||
| operation. | ||||
| </li> | ||||
| <li> | ||||
| The new session structure and associated change in | ||||
| locking have made the error unnecessary. | ||||
| </li> | ||||
| <li> | ||||
| There has been a restructuring of some errors for | ||||
| NFSv4.1 that resulted in the elimination of certain errors. | ||||
| </li> | ||||
| </ul> | ||||
| <section anchor="err_BAD_SEQID" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_BAD_SEQID (Error Code 10026)</name> | ||||
| <t> | ||||
| The sequence number (seqid) in a locking request is neither the | ||||
| next expected number or the last number processed. These | ||||
| seqids are ignored in NFSv4.1. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_LEASE_MOVED" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_LEASE_MOVED (Error Code 10031)</name> | ||||
| <t> | ||||
| A lease being renewed is associated with a file system | ||||
| that has been migrated to a new server. The error has | ||||
| been superseded by the SEQ4_STATUS_LEASE_MOVED status bit | ||||
| (see <xref target="OP_SEQUENCE" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_NXIO" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_NXIO (Error Code 5)</name> | ||||
| <t> | ||||
| I/O error. No such device or address. This error is | ||||
| for errors involving block and character device access, | ||||
| but because NFSv4.1 is not a device-access protocol, this | ||||
| error is not applicable. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_RESTOREFH" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_RESTOREFH (Error Code 10030)</name> | ||||
| <t> | ||||
| The RESTOREFH operation does not have a saved filehandle | ||||
| (identified by SAVEFH) to operate upon. In NFSv4.1, this error has | ||||
| been superseded by NFS4ERR_NOFILEHANDLE. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="err_STALE_STATEID" numbered="true" toc="default"> | ||||
| <name>NFS4ERR_STALE_STATEID (Error Code 10023)</name> | ||||
| <t> | ||||
| A stateid generated by an earlier server instance was | ||||
| used. This error is moot in NFSv4.1 because all operations that | ||||
| take a stateid <bcp14>MUST</bcp14> be preceded by the SEQUENCE operation, | ||||
| and the earlier server instance is detected by the session | ||||
| infrastructure that supports SEQUENCE. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] When adding new errors above, add them to the next section under --> | ||||
| <!-- [auth] the appropriate operation; the next table for errors to --> | ||||
| <!-- [auth] operations is automatically generated. --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Operations and Their Valid Errors</name> | ||||
| <t> | ||||
| This section contains a table that gives the valid error returns | ||||
| for each protocol operation. The error code NFS4_OK (indicating | ||||
| no error) is not listed but should be understood to be returnable | ||||
| by all operations with two important exceptions: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The operations that <bcp14>MUST NOT</bcp14> be implemented: | ||||
| OPEN_CONFIRM, RELEASE_LOCKOWNER, RENEW, SETCLIENTID, and | ||||
| SETCLIENTID_CONFIRM. | ||||
| </li> | ||||
| <li> | ||||
| The invalid operation: ILLEGAL. | ||||
| </li> | ||||
| </ul> | ||||
| <table anchor="op_error_returns" align="center"> | ||||
| <name>Valid Error Returns for Each Protocol Operation</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Operation</th> | ||||
| <th align="left">Errors</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">ACCESS</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">BACKCHANNEL_CTL</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_NOENT, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">BIND_CONN_TO_SESSION</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADSESSION, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_SESSION_DIGEST, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_NOT_ONLY_OP, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CLOSE</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ADMIN_REVOKED, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_EXPIRED, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_LOCKS_HELD, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_CRED | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">COMMIT</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_ISDIR, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_SYMLINK, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CREATE</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_ATTRNOTSUPP, | ||||
| NFS4ERR_BADCHAR, | ||||
| NFS4ERR_BADNAME, | ||||
| NFS4ERR_BADOWNER, | ||||
| NFS4ERR_BADTYPE, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DQUOT, | ||||
| NFS4ERR_EXIST, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MLINK, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NAMETOOLONG, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOSPC, | ||||
| NFS4ERR_NOTDIR, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_PERM, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_ROFS, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_UNSAFE_COMPOUND | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CREATE_SESSION</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_CLID_INUSE, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_NOENT, | ||||
| NFS4ERR_NOT_ONLY_OP, | ||||
| NFS4ERR_NOSPC, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SEQ_MISORDERED, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE_CLIENTID, | ||||
| NFS4ERR_TOOSMALL, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_CRED | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">DELEGPURGE</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_CRED | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">DELEGRETURN</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ADMIN_REVOKED, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DELEG_REVOKED, | ||||
| NFS4ERR_EXPIRED, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_CRED | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">DESTROY_CLIENTID</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_CLIENTID_BUSY, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_NOT_ONLY_OP, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE_CLIENTID, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_CRED | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">DESTROY_SESSION</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BACK_CHAN_BUSY, | ||||
| NFS4ERR_BADSESSION, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_CB_PATH_DOWN, | ||||
| NFS4ERR_CONN_NOT_BOUND_TO_SESSION, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_NOT_ONLY_OP, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE_CLIENTID, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_CRED | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">EXCHANGE_ID</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADCHAR, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_CLID_INUSE, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_ENCR_ALG_UNSUPP, | ||||
| NFS4ERR_HASH_ALG_UNSUPP, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_NOENT, | ||||
| NFS4ERR_NOT_ONLY_OP, | ||||
| NFS4ERR_NOT_SAME, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">FREE_STATEID</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_LOCKS_HELD, | ||||
| NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_CRED | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">GET_DIR_DELEGATION</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DIRDELEG_UNAVAIL, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOTDIR, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">GETATTR</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">GETDEVICEINFO</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_NOENT, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOOSMALL, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_UNKNOWN_LAYOUTTYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">GETDEVICELIST</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_COOKIE, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_NOT_SAME, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_UNKNOWN_LAYOUTTYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">GETFH</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_STALE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">ILLEGAL</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_OP_ILLEGAL | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">LAYOUTCOMMIT</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_ADMIN_REVOKED, | ||||
| NFS4ERR_ATTRNOTSUPP, | ||||
| NFS4ERR_BADIOMODE, | ||||
| NFS4ERR_BADLAYOUT, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DELEG_REVOKED, | ||||
| NFS4ERR_EXPIRED, | ||||
| NFS4ERR_FBIG, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_ISDIR | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_NO_GRACE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_RECLAIM_BAD, | ||||
| NFS4ERR_RECLAIM_CONFLICT, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_SYMLINK, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_UNKNOWN_LAYOUTTYPE, | ||||
| NFS4ERR_WRONG_CRED | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">LAYOUTGET</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_ADMIN_REVOKED, | ||||
| NFS4ERR_BADIOMODE, | ||||
| NFS4ERR_BADLAYOUT, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DELEG_REVOKED, | ||||
| NFS4ERR_DQUOT, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_LAYOUTTRYLATER, | ||||
| NFS4ERR_LAYOUTUNAVAILABLE, | ||||
| NFS4ERR_LOCKED, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOSPC, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_OPENMODE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_RECALLCONFLICT, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOOSMALL, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_UNKNOWN_LAYOUTTYPE, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">LAYOUTRETURN</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ADMIN_REVOKED, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DELEG_REVOKED, | ||||
| NFS4ERR_EXPIRED, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_ISDIR, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_NO_GRACE, | ||||
| NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_UNKNOWN_LAYOUTTYPE, | ||||
| NFS4ERR_WRONG_CRED, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">LINK</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADCHAR, | ||||
| NFS4ERR_BADNAME, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DQUOT, | ||||
| NFS4ERR_EXIST, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_FILE_OPEN, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_ISDIR, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MLINK, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NAMETOOLONG, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOSPC, | ||||
| NFS4ERR_NOTDIR, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_ROFS, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_SYMLINK, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONGSEC, | ||||
| NFS4ERR_WRONG_TYPE, | ||||
| NFS4ERR_XDEV | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">LOCK</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_ADMIN_REVOKED, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_RANGE, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DEADLOCK, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DENIED, | ||||
| NFS4ERR_EXPIRED, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_ISDIR, | ||||
| NFS4ERR_LOCK_NOTSUPP, | ||||
| NFS4ERR_LOCK_RANGE, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NO_GRACE, | ||||
| NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_OPENMODE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_RECLAIM_BAD, | ||||
| NFS4ERR_RECLAIM_CONFLICT, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_ROFS, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_SYMLINK, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_CRED, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">LOCKT</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_RANGE, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DENIED, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_ISDIR, | ||||
| NFS4ERR_LOCK_RANGE, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_ROFS, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_SYMLINK, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_CRED, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">LOCKU</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_ADMIN_REVOKED, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_RANGE, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_EXPIRED, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_LOCK_RANGE, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_CRED | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">LOOKUP</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADCHAR, | ||||
| NFS4ERR_BADNAME, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NAMETOOLONG, | ||||
| NFS4ERR_NOENT, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOTDIR, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_SYMLINK, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONGSEC | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">LOOKUPP</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOENT, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOTDIR, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_SYMLINK, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONGSEC | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NVERIFY</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_ATTRNOTSUPP, | ||||
| NFS4ERR_BADCHAR, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SAME, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_UNKNOWN_LAYOUTTYPE, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">OPEN</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_ADMIN_REVOKED, | ||||
| NFS4ERR_ATTRNOTSUPP, | ||||
| NFS4ERR_BADCHAR, | ||||
| NFS4ERR_BADNAME, | ||||
| NFS4ERR_BADOWNER, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DELEG_ALREADY_WANTED, | ||||
| NFS4ERR_DELEG_REVOKED, | ||||
| NFS4ERR_DQUOT, | ||||
| NFS4ERR_EXIST, | ||||
| NFS4ERR_EXPIRED, | ||||
| NFS4ERR_FBIG, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_ISDIR, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NAMETOOLONG, | ||||
| NFS4ERR_NOENT, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOSPC, | ||||
| NFS4ERR_NOTDIR, | ||||
| NFS4ERR_NO_GRACE, | ||||
| NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_PERM, | ||||
| NFS4ERR_RECLAIM_BAD, | ||||
| NFS4ERR_RECLAIM_CONFLICT, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_ROFS, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_SHARE_DENIED, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_SYMLINK, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_UNSAFE_COMPOUND, | ||||
| NFS4ERR_WRONGSEC, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">OPEN_CONFIRM</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_NOTSUPP | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">OPEN_DOWNGRADE</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ADMIN_REVOKED, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_EXPIRED, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_ROFS, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_CRED | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">OPENATTR</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DQUOT, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOENT, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOSPC, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_ROFS, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_UNSAFE_COMPOUND, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">PUTFH</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADHANDLE, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONGSEC | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">PUTPUBFH</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONGSEC | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">PUTROOTFH</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONGSEC | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">READ</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_ADMIN_REVOKED, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DELEG_REVOKED, | ||||
| NFS4ERR_EXPIRED, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_ISDIR, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_LOCKED, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_OPENMODE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_PNFS_IO_HOLE, | ||||
| NFS4ERR_PNFS_NO_LAYOUT, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_SYMLINK, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">READDIR</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_COOKIE, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOTDIR, | ||||
| NFS4ERR_NOT_SAME, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOOSMALL, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">READLINK</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">RECLAIM_COMPLETE</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_COMPLETE_ALREADY, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_CRED, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">RELEASE_LOCKOWNER</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_NOTSUPP | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">REMOVE</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADCHAR, | ||||
| NFS4ERR_BADNAME, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_FILE_OPEN, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NAMETOOLONG, | ||||
| NFS4ERR_NOENT, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOTDIR, | ||||
| NFS4ERR_NOTEMPTY, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_ROFS, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">RENAME</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADCHAR, | ||||
| NFS4ERR_BADNAME, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DQUOT, | ||||
| NFS4ERR_EXIST, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_FILE_OPEN, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MLINK, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NAMETOOLONG, | ||||
| NFS4ERR_NOENT, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOSPC, | ||||
| NFS4ERR_NOTDIR, | ||||
| NFS4ERR_NOTEMPTY, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_ROFS, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONGSEC, | ||||
| NFS4ERR_XDEV | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">RENEW</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_NOTSUPP | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">RESTOREFH</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONGSEC | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">SAVEFH</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">SECINFO</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADCHAR, | ||||
| NFS4ERR_BADNAME, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NAMETOOLONG, | ||||
| NFS4ERR_NOENT, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOTDIR, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">SECINFO_NO_NAME</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOENT, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOTDIR, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">SEQUENCE</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADSESSION, | ||||
| NFS4ERR_BADSLOT, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_HIGH_SLOT, | ||||
| NFS4ERR_CONN_NOT_BOUND_TO_SESSION, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SEQUENCE_POS, | ||||
| NFS4ERR_SEQ_FALSE_RETRY, | ||||
| NFS4ERR_SEQ_MISORDERED, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">SET_SSV</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_SESSION_DIGEST, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">SETATTR</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_ADMIN_REVOKED, | ||||
| NFS4ERR_ATTRNOTSUPP, | ||||
| NFS4ERR_BADCHAR, | ||||
| NFS4ERR_BADOWNER, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DELEG_REVOKED, | ||||
| NFS4ERR_DQUOT, | ||||
| NFS4ERR_EXPIRED, | ||||
| NFS4ERR_FBIG, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_LOCKED, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOSPC, | ||||
| NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_OPENMODE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_PERM, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_ROFS, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_UNKNOWN_LAYOUTTYPE, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">SETCLIENTID</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_NOTSUPP | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">SETCLIENTID_CONFIRM</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_NOTSUPP | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">TEST_STATEID</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">VERIFY</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_ATTRNOTSUPP, | ||||
| NFS4ERR_BADCHAR, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOT_SAME, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_UNKNOWN_LAYOUTTYPE, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">WANT_DELEGATION</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DELEG_ALREADY_WANTED, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_NO_GRACE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_RECALLCONFLICT, | ||||
| NFS4ERR_RECLAIM_BAD, | ||||
| NFS4ERR_RECLAIM_CONFLICT, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">WRITE</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_ACCESS, | ||||
| NFS4ERR_ADMIN_REVOKED, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DEADSESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_DELEG_REVOKED, | ||||
| NFS4ERR_DQUOT, | ||||
| NFS4ERR_EXPIRED, | ||||
| NFS4ERR_FBIG, | ||||
| NFS4ERR_FHEXPIRED, | ||||
| NFS4ERR_GRACE, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_IO, | ||||
| NFS4ERR_ISDIR, | ||||
| NFS4ERR_LOCKED, | ||||
| NFS4ERR_MOVED, | ||||
| NFS4ERR_NOFILEHANDLE, | ||||
| NFS4ERR_NOSPC, | ||||
| NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_OPENMODE, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_PNFS_IO_HOLE, | ||||
| NFS4ERR_PNFS_NO_LAYOUT, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_ROFS, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_STALE, | ||||
| NFS4ERR_SYMLINK, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| <!-- [auth] When adding new errors above, add them to the next section under --> | ||||
| <!-- [auth] the appropriate operation; the next table for errors to --> | ||||
| <!-- [auth] operations is automatically generated. --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Callback Operations and Their Valid Errors</name> | ||||
| <t> | ||||
| This section contains a table that gives the valid error returns | ||||
| for each callback operation. The error code NFS4_OK (indicating | ||||
| no error) is not listed but should be understood to be returnable | ||||
| by all callback operations with the exception of CB_ILLEGAL. | ||||
| </t> | ||||
| <table anchor="cb_op_error_returns" align="center"> | ||||
| <name>Valid Error Returns for Each Protocol Callback Operation</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Callback Operation</th> | ||||
| <th align="left">Errors</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">CB_GETATTR</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADHANDLE, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CB_ILLEGAL</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_OP_ILLEGAL | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CB_LAYOUTRECALL</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADHANDLE, | ||||
| NFS4ERR_BADIOMODE, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_NOMATCHING_LAYOUT, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_UNKNOWN_LAYOUTTYPE, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CB_NOTIFY</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADHANDLE, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CB_NOTIFY_DEVICEID</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CB_NOTIFY_LOCK</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADHANDLE, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CB_PUSH_DELEG</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADHANDLE, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REJECT_DELEG, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS, | ||||
| NFS4ERR_WRONG_TYPE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CB_RECALL</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADHANDLE, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_STATEID, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CB_RECALL_ANY</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CB_RECALLABLE_OBJ_AVAIL</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_INVAL, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CB_RECALL_SLOT</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_HIGH_SLOT, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CB_SEQUENCE</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADSESSION, | ||||
| NFS4ERR_BADSLOT, | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_BAD_HIGH_SLOT, | ||||
| NFS4ERR_CONN_NOT_BOUND_TO_SESSION, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SEQUENCE_POS, | ||||
| NFS4ERR_SEQ_FALSE_RETRY, | ||||
| NFS4ERR_SEQ_MISORDERED, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">CB_WANTS_CANCELLED</td> | ||||
| <td align="left"> | ||||
| NFS4ERR_BADXDR, | ||||
| NFS4ERR_DELAY, | ||||
| NFS4ERR_NOTSUPP, | ||||
| NFS4ERR_OP_NOT_IN_SESSION, | ||||
| NFS4ERR_REP_TOO_BIG, | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, | ||||
| NFS4ERR_REQ_TOO_BIG, | ||||
| NFS4ERR_RETRY_UNCACHED_REP, | ||||
| NFS4ERR_SERVERFAULT, | ||||
| NFS4ERR_TOO_MANY_OPS | ||||
| </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| <!-- [auth] INCLUDE THE AUTO GENERATED ERROR TO OP TABLE --> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Errors and the Operations That Use Them</name> | ||||
| <table anchor="error_op_returns" align="center"> | ||||
| <name>Errors and the Operations That Use Them</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Error</th> | ||||
| <th align="left">Operations</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_ACCESS</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| GETATTR, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SETATTR, | ||||
| VERIFY, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_ADMIN_REVOKED</td> | ||||
| <td align="left"> | ||||
| CLOSE, | ||||
| DELEGRETURN, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LOCK, | ||||
| LOCKU, | ||||
| OPEN, | ||||
| OPEN_DOWNGRADE, | ||||
| READ, | ||||
| SETATTR, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_ATTRNOTSUPP</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| LAYOUTCOMMIT, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| SETATTR, | ||||
| VERIFY | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BACK_CHAN_BUSY</td> | ||||
| <td align="left"> | ||||
| DESTROY_SESSION | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADCHAR</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| EXCHANGE_ID, | ||||
| LINK, | ||||
| LOOKUP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| SECINFO, | ||||
| SETATTR, | ||||
| VERIFY | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADHANDLE</td> | ||||
| <td align="left"> | ||||
| CB_GETATTR, | ||||
| CB_LAYOUTRECALL, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_LOCK, | ||||
| CB_PUSH_DELEG, | ||||
| CB_RECALL, | ||||
| PUTFH | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADIOMODE</td> | ||||
| <td align="left"> | ||||
| CB_LAYOUTRECALL, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADLAYOUT</td> | ||||
| <td align="left"> | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADNAME</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| LINK, | ||||
| LOOKUP, | ||||
| OPEN, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| SECINFO | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADOWNER</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| OPEN, | ||||
| SETATTR | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADSESSION</td> | ||||
| <td align="left"> | ||||
| BIND_CONN_TO_SESSION, | ||||
| CB_SEQUENCE, | ||||
| DESTROY_SESSION, | ||||
| SEQUENCE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADSLOT</td> | ||||
| <td align="left"> | ||||
| CB_SEQUENCE, | ||||
| SEQUENCE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADTYPE</td> | ||||
| <td align="left"> | ||||
| CREATE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADXDR</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| BACKCHANNEL_CTL, | ||||
| BIND_CONN_TO_SESSION, | ||||
| CB_GETATTR, | ||||
| CB_ILLEGAL, | ||||
| CB_LAYOUTRECALL, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_DEVICEID, | ||||
| CB_NOTIFY_LOCK, | ||||
| CB_PUSH_DELEG, | ||||
| CB_RECALL, | ||||
| CB_RECALLABLE_OBJ_AVAIL, | ||||
| CB_RECALL_ANY, | ||||
| CB_RECALL_SLOT, | ||||
| CB_SEQUENCE, | ||||
| CB_WANTS_CANCELLED, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| CREATE_SESSION, | ||||
| DELEGPURGE, | ||||
| DELEGRETURN, | ||||
| DESTROY_CLIENTID, | ||||
| DESTROY_SESSION, | ||||
| EXCHANGE_ID, | ||||
| FREE_STATEID, | ||||
| GETATTR, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| GET_DIR_DELEGATION, | ||||
| ILLEGAL, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| PUTFH, | ||||
| READ, | ||||
| READDIR, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SEQUENCE, | ||||
| SETATTR, | ||||
| SET_SSV, | ||||
| TEST_STATEID, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BAD_COOKIE</td> | ||||
| <td align="left"> | ||||
| GETDEVICELIST, | ||||
| READDIR | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BAD_HIGH_SLOT</td> | ||||
| <td align="left"> | ||||
| CB_RECALL_SLOT, | ||||
| CB_SEQUENCE, | ||||
| SEQUENCE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BAD_RANGE</td> | ||||
| <td align="left"> | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BAD_SESSION_DIGEST</td> | ||||
| <td align="left"> | ||||
| BIND_CONN_TO_SESSION, | ||||
| SET_SSV | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BAD_STATEID</td> | ||||
| <td align="left"> | ||||
| CB_LAYOUTRECALL, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_LOCK, | ||||
| CB_RECALL, | ||||
| CLOSE, | ||||
| DELEGRETURN, | ||||
| FREE_STATEID, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LOCK, | ||||
| LOCKU, | ||||
| OPEN, | ||||
| OPEN_DOWNGRADE, | ||||
| READ, | ||||
| SETATTR, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_CB_PATH_DOWN</td> | ||||
| <td align="left"> | ||||
| DESTROY_SESSION | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_CLID_INUSE</td> | ||||
| <td align="left"> | ||||
| CREATE_SESSION, | ||||
| EXCHANGE_ID | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_CLIENTID_BUSY</td> | ||||
| <td align="left"> | ||||
| DESTROY_CLIENTID | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_COMPLETE_ALREADY</td> | ||||
| <td align="left"> | ||||
| RECLAIM_COMPLETE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_CONN_NOT_BOUND_TO_SESSION</td> | ||||
| <td align="left"> | ||||
| CB_SEQUENCE, | ||||
| DESTROY_SESSION, | ||||
| SEQUENCE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DEADLOCK</td> | ||||
| <td align="left"> | ||||
| LOCK | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DEADSESSION</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| BACKCHANNEL_CTL, | ||||
| BIND_CONN_TO_SESSION, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| CREATE_SESSION, | ||||
| DELEGPURGE, | ||||
| DELEGRETURN, | ||||
| DESTROY_CLIENTID, | ||||
| DESTROY_SESSION, | ||||
| EXCHANGE_ID, | ||||
| FREE_STATEID, | ||||
| GETATTR, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| PUTFH, | ||||
| PUTPUBFH, | ||||
| PUTROOTFH, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| RESTOREFH, | ||||
| SAVEFH, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SEQUENCE, | ||||
| SETATTR, | ||||
| SET_SSV, | ||||
| TEST_STATEID, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DELAY</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| BACKCHANNEL_CTL, | ||||
| BIND_CONN_TO_SESSION, | ||||
| CB_GETATTR, | ||||
| CB_LAYOUTRECALL, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_DEVICEID, | ||||
| CB_NOTIFY_LOCK, | ||||
| CB_PUSH_DELEG, | ||||
| CB_RECALL, | ||||
| CB_RECALLABLE_OBJ_AVAIL, | ||||
| CB_RECALL_ANY, | ||||
| CB_RECALL_SLOT, | ||||
| CB_SEQUENCE, | ||||
| CB_WANTS_CANCELLED, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| CREATE_SESSION, | ||||
| DELEGPURGE, | ||||
| DELEGRETURN, | ||||
| DESTROY_CLIENTID, | ||||
| DESTROY_SESSION, | ||||
| EXCHANGE_ID, | ||||
| FREE_STATEID, | ||||
| GETATTR, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| PUTFH, | ||||
| PUTPUBFH, | ||||
| PUTROOTFH, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SEQUENCE, | ||||
| SETATTR, | ||||
| SET_SSV, | ||||
| TEST_STATEID, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DELEG_ALREADY_WANTED</td> | ||||
| <td align="left"> | ||||
| OPEN, | ||||
| WANT_DELEGATION | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DELEG_REVOKED</td> | ||||
| <td align="left"> | ||||
| DELEGRETURN, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| OPEN, | ||||
| READ, | ||||
| SETATTR, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DENIED</td> | ||||
| <td align="left"> | ||||
| LOCK, | ||||
| LOCKT | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DIRDELEG_UNAVAIL</td> | ||||
| <td align="left"> | ||||
| GET_DIR_DELEGATION | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DQUOT</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| LAYOUTGET, | ||||
| LINK, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| RENAME, | ||||
| SETATTR, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_ENCR_ALG_UNSUPP</td> | ||||
| <td align="left"> | ||||
| EXCHANGE_ID | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_EXIST</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| LINK, | ||||
| OPEN, | ||||
| RENAME | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_EXPIRED</td> | ||||
| <td align="left"> | ||||
| CLOSE, | ||||
| DELEGRETURN, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTRETURN, | ||||
| LOCK, | ||||
| LOCKU, | ||||
| OPEN, | ||||
| OPEN_DOWNGRADE, | ||||
| READ, | ||||
| SETATTR, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_FBIG</td> | ||||
| <td align="left"> | ||||
| LAYOUTCOMMIT, | ||||
| OPEN, | ||||
| SETATTR, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_FHEXPIRED</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| DELEGRETURN, | ||||
| GETATTR, | ||||
| GETDEVICELIST, | ||||
| GETFH, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| RESTOREFH, | ||||
| SAVEFH, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SETATTR, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_FILE_OPEN</td> | ||||
| <td align="left"> | ||||
| LINK, | ||||
| REMOVE, | ||||
| RENAME | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_GRACE</td> | ||||
| <td align="left"> | ||||
| GETATTR, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| READ, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| SETATTR, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_HASH_ALG_UNSUPP</td> | ||||
| <td align="left"> | ||||
| EXCHANGE_ID | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_INVAL</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| BACKCHANNEL_CTL, | ||||
| BIND_CONN_TO_SESSION, | ||||
| CB_GETATTR, | ||||
| CB_LAYOUTRECALL, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_DEVICEID, | ||||
| CB_PUSH_DELEG, | ||||
| CB_RECALLABLE_OBJ_AVAIL, | ||||
| CB_RECALL_ANY, | ||||
| CREATE, | ||||
| CREATE_SESSION, | ||||
| DELEGRETURN, | ||||
| EXCHANGE_ID, | ||||
| GETATTR, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPEN_DOWNGRADE, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SETATTR, | ||||
| SET_SSV, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_IO</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| GETATTR, | ||||
| GETDEVICELIST, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LINK, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| SETATTR, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_ISDIR</td> | ||||
| <td align="left"> | ||||
| COMMIT, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| OPEN, | ||||
| READ, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LAYOUTTRYLATER</td> | ||||
| <td align="left"> | ||||
| LAYOUTGET | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LAYOUTUNAVAILABLE</td> | ||||
| <td align="left"> | ||||
| LAYOUTGET | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LOCKED</td> | ||||
| <td align="left"> | ||||
| LAYOUTGET, | ||||
| READ, | ||||
| SETATTR, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LOCKS_HELD</td> | ||||
| <td align="left"> | ||||
| CLOSE, | ||||
| FREE_STATEID | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LOCK_NOTSUPP</td> | ||||
| <td align="left"> | ||||
| LOCK | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_LOCK_RANGE</td> | ||||
| <td align="left"> | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_MLINK</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| LINK, | ||||
| RENAME | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_MOVED</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| DELEGRETURN, | ||||
| GETATTR, | ||||
| GETFH, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| PUTFH, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| RESTOREFH, | ||||
| SAVEFH, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SETATTR, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NAMETOOLONG</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| LINK, | ||||
| LOOKUP, | ||||
| OPEN, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| SECINFO | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOENT</td> | ||||
| <td align="left"> | ||||
| BACKCHANNEL_CTL, | ||||
| CREATE_SESSION, | ||||
| EXCHANGE_ID, | ||||
| GETDEVICEINFO, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOFILEHANDLE</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| DELEGRETURN, | ||||
| GETATTR, | ||||
| GETDEVICELIST, | ||||
| GETFH, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| RESTOREFH, | ||||
| SAVEFH, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SETATTR, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOMATCHING_LAYOUT</td> | ||||
| <td align="left"> | ||||
| CB_LAYOUTRECALL | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOSPC</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| CREATE_SESSION, | ||||
| LAYOUTGET, | ||||
| LINK, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| RENAME, | ||||
| SETATTR, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOTDIR</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| GET_DIR_DELEGATION, | ||||
| LINK, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| OPEN, | ||||
| READDIR, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOTEMPTY</td> | ||||
| <td align="left"> | ||||
| REMOVE, | ||||
| RENAME | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOTSUPP</td> | ||||
| <td align="left"> | ||||
| CB_LAYOUTRECALL, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_DEVICEID, | ||||
| CB_NOTIFY_LOCK, | ||||
| CB_PUSH_DELEG, | ||||
| CB_RECALLABLE_OBJ_AVAIL, | ||||
| CB_WANTS_CANCELLED, | ||||
| DELEGPURGE, | ||||
| DELEGRETURN, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| OPENATTR, | ||||
| OPEN_CONFIRM, | ||||
| RELEASE_LOCKOWNER, | ||||
| RENEW, | ||||
| SECINFO_NO_NAME, | ||||
| SETCLIENTID, | ||||
| SETCLIENTID_CONFIRM, | ||||
| WANT_DELEGATION | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOT_ONLY_OP</td> | ||||
| <td align="left"> | ||||
| BIND_CONN_TO_SESSION, | ||||
| CREATE_SESSION, | ||||
| DESTROY_CLIENTID, | ||||
| DESTROY_SESSION, | ||||
| EXCHANGE_ID | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NOT_SAME</td> | ||||
| <td align="left"> | ||||
| EXCHANGE_ID, | ||||
| GETDEVICELIST, | ||||
| READDIR, | ||||
| VERIFY | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_NO_GRACE</td> | ||||
| <td align="left"> | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTRETURN, | ||||
| LOCK, | ||||
| OPEN, | ||||
| WANT_DELEGATION | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_OLD_STATEID</td> | ||||
| <td align="left"> | ||||
| CLOSE, | ||||
| DELEGRETURN, | ||||
| FREE_STATEID, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LOCK, | ||||
| LOCKU, | ||||
| OPEN, | ||||
| OPEN_DOWNGRADE, | ||||
| READ, | ||||
| SETATTR, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_OPENMODE</td> | ||||
| <td align="left"> | ||||
| LAYOUTGET, | ||||
| LOCK, | ||||
| READ, | ||||
| SETATTR, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_OP_ILLEGAL</td> | ||||
| <td align="left"> | ||||
| CB_ILLEGAL, | ||||
| ILLEGAL | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_OP_NOT_IN_SESSION</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| BACKCHANNEL_CTL, | ||||
| CB_GETATTR, | ||||
| CB_LAYOUTRECALL, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_DEVICEID, | ||||
| CB_NOTIFY_LOCK, | ||||
| CB_PUSH_DELEG, | ||||
| CB_RECALL, | ||||
| CB_RECALLABLE_OBJ_AVAIL, | ||||
| CB_RECALL_ANY, | ||||
| CB_RECALL_SLOT, | ||||
| CB_WANTS_CANCELLED, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| DELEGPURGE, | ||||
| DELEGRETURN, | ||||
| FREE_STATEID, | ||||
| GETATTR, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| GETFH, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| PUTFH, | ||||
| PUTPUBFH, | ||||
| PUTROOTFH, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| RESTOREFH, | ||||
| SAVEFH, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SETATTR, | ||||
| SET_SSV, | ||||
| TEST_STATEID, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_PERM</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| OPEN, | ||||
| SETATTR | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_PNFS_IO_HOLE</td> | ||||
| <td align="left"> | ||||
| READ, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_PNFS_NO_LAYOUT</td> | ||||
| <td align="left"> | ||||
| READ, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_RECALLCONFLICT</td> | ||||
| <td align="left"> | ||||
| LAYOUTGET, | ||||
| WANT_DELEGATION | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_RECLAIM_BAD</td> | ||||
| <td align="left"> | ||||
| LAYOUTCOMMIT, | ||||
| LOCK, | ||||
| OPEN, | ||||
| WANT_DELEGATION | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_RECLAIM_CONFLICT</td> | ||||
| <td align="left"> | ||||
| LAYOUTCOMMIT, | ||||
| LOCK, | ||||
| OPEN, | ||||
| WANT_DELEGATION | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REJECT_DELEG</td> | ||||
| <td align="left"> | ||||
| CB_PUSH_DELEG | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REP_TOO_BIG</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| BACKCHANNEL_CTL, | ||||
| BIND_CONN_TO_SESSION, | ||||
| CB_GETATTR, | ||||
| CB_LAYOUTRECALL, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_DEVICEID, | ||||
| CB_NOTIFY_LOCK, | ||||
| CB_PUSH_DELEG, | ||||
| CB_RECALL, | ||||
| CB_RECALLABLE_OBJ_AVAIL, | ||||
| CB_RECALL_ANY, | ||||
| CB_RECALL_SLOT, | ||||
| CB_SEQUENCE, | ||||
| CB_WANTS_CANCELLED, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| CREATE_SESSION, | ||||
| DELEGPURGE, | ||||
| DELEGRETURN, | ||||
| DESTROY_CLIENTID, | ||||
| DESTROY_SESSION, | ||||
| EXCHANGE_ID, | ||||
| FREE_STATEID, | ||||
| GETATTR, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| PUTFH, | ||||
| PUTPUBFH, | ||||
| PUTROOTFH, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| RESTOREFH, | ||||
| SAVEFH, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SEQUENCE, | ||||
| SETATTR, | ||||
| SET_SSV, | ||||
| TEST_STATEID, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REP_TOO_BIG_TO_CACHE</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| BACKCHANNEL_CTL, | ||||
| BIND_CONN_TO_SESSION, | ||||
| CB_GETATTR, | ||||
| CB_LAYOUTRECALL, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_DEVICEID, | ||||
| CB_NOTIFY_LOCK, | ||||
| CB_PUSH_DELEG, | ||||
| CB_RECALL, | ||||
| CB_RECALLABLE_OBJ_AVAIL, | ||||
| CB_RECALL_ANY, | ||||
| CB_RECALL_SLOT, | ||||
| CB_SEQUENCE, | ||||
| CB_WANTS_CANCELLED, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| CREATE_SESSION, | ||||
| DELEGPURGE, | ||||
| DELEGRETURN, | ||||
| DESTROY_CLIENTID, | ||||
| DESTROY_SESSION, | ||||
| EXCHANGE_ID, | ||||
| FREE_STATEID, | ||||
| GETATTR, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| PUTFH, | ||||
| PUTPUBFH, | ||||
| PUTROOTFH, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| RESTOREFH, | ||||
| SAVEFH, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SEQUENCE, | ||||
| SETATTR, | ||||
| SET_SSV, | ||||
| TEST_STATEID, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REQ_TOO_BIG</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| BACKCHANNEL_CTL, | ||||
| BIND_CONN_TO_SESSION, | ||||
| CB_GETATTR, | ||||
| CB_LAYOUTRECALL, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_DEVICEID, | ||||
| CB_NOTIFY_LOCK, | ||||
| CB_PUSH_DELEG, | ||||
| CB_RECALL, | ||||
| CB_RECALLABLE_OBJ_AVAIL, | ||||
| CB_RECALL_ANY, | ||||
| CB_RECALL_SLOT, | ||||
| CB_SEQUENCE, | ||||
| CB_WANTS_CANCELLED, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| CREATE_SESSION, | ||||
| DELEGPURGE, | ||||
| DELEGRETURN, | ||||
| DESTROY_CLIENTID, | ||||
| DESTROY_SESSION, | ||||
| EXCHANGE_ID, | ||||
| FREE_STATEID, | ||||
| GETATTR, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| PUTFH, | ||||
| PUTPUBFH, | ||||
| PUTROOTFH, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| RESTOREFH, | ||||
| SAVEFH, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SEQUENCE, | ||||
| SETATTR, | ||||
| SET_SSV, | ||||
| TEST_STATEID, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_RETRY_UNCACHED_REP</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| BACKCHANNEL_CTL, | ||||
| BIND_CONN_TO_SESSION, | ||||
| CB_GETATTR, | ||||
| CB_LAYOUTRECALL, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_DEVICEID, | ||||
| CB_NOTIFY_LOCK, | ||||
| CB_PUSH_DELEG, | ||||
| CB_RECALL, | ||||
| CB_RECALLABLE_OBJ_AVAIL, | ||||
| CB_RECALL_ANY, | ||||
| CB_RECALL_SLOT, | ||||
| CB_SEQUENCE, | ||||
| CB_WANTS_CANCELLED, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| CREATE_SESSION, | ||||
| DELEGPURGE, | ||||
| DELEGRETURN, | ||||
| DESTROY_CLIENTID, | ||||
| DESTROY_SESSION, | ||||
| EXCHANGE_ID, | ||||
| FREE_STATEID, | ||||
| GETATTR, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| PUTFH, | ||||
| PUTPUBFH, | ||||
| PUTROOTFH, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| RESTOREFH, | ||||
| SAVEFH, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SEQUENCE, | ||||
| SETATTR, | ||||
| SET_SSV, | ||||
| TEST_STATEID, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_ROFS</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| SETATTR, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SAME</td> | ||||
| <td align="left"> | ||||
| NVERIFY | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SEQUENCE_POS</td> | ||||
| <td align="left"> | ||||
| CB_SEQUENCE, | ||||
| SEQUENCE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SEQ_FALSE_RETRY</td> | ||||
| <td align="left"> | ||||
| CB_SEQUENCE, | ||||
| SEQUENCE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SEQ_MISORDERED</td> | ||||
| <td align="left"> | ||||
| CB_SEQUENCE, | ||||
| CREATE_SESSION, | ||||
| SEQUENCE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SERVERFAULT</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| BIND_CONN_TO_SESSION, | ||||
| CB_GETATTR, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_DEVICEID, | ||||
| CB_NOTIFY_LOCK, | ||||
| CB_PUSH_DELEG, | ||||
| CB_RECALL, | ||||
| CB_RECALLABLE_OBJ_AVAIL, | ||||
| CB_WANTS_CANCELLED, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| CREATE_SESSION, | ||||
| DELEGPURGE, | ||||
| DELEGRETURN, | ||||
| DESTROY_CLIENTID, | ||||
| DESTROY_SESSION, | ||||
| EXCHANGE_ID, | ||||
| FREE_STATEID, | ||||
| GETATTR, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| PUTFH, | ||||
| PUTPUBFH, | ||||
| PUTROOTFH, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| RESTOREFH, | ||||
| SAVEFH, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SETATTR, | ||||
| TEST_STATEID, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SHARE_DENIED</td> | ||||
| <td align="left"> | ||||
| OPEN | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_STALE</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| DELEGRETURN, | ||||
| GETATTR, | ||||
| GETFH, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| PUTFH, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| RESTOREFH, | ||||
| SAVEFH, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SETATTR, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_STALE_CLIENTID</td> | ||||
| <td align="left"> | ||||
| CREATE_SESSION, | ||||
| DESTROY_CLIENTID, | ||||
| DESTROY_SESSION | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SYMLINK</td> | ||||
| <td align="left"> | ||||
| COMMIT, | ||||
| LAYOUTCOMMIT, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| OPEN, | ||||
| READ, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_TOOSMALL</td> | ||||
| <td align="left"> | ||||
| CREATE_SESSION, | ||||
| GETDEVICEINFO, | ||||
| LAYOUTGET, | ||||
| READDIR | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_TOO_MANY_OPS</td> | ||||
| <td align="left"> | ||||
| ACCESS, | ||||
| BACKCHANNEL_CTL, | ||||
| BIND_CONN_TO_SESSION, | ||||
| CB_GETATTR, | ||||
| CB_LAYOUTRECALL, | ||||
| CB_NOTIFY, | ||||
| CB_NOTIFY_DEVICEID, | ||||
| CB_NOTIFY_LOCK, | ||||
| CB_PUSH_DELEG, | ||||
| CB_RECALL, | ||||
| CB_RECALLABLE_OBJ_AVAIL, | ||||
| CB_RECALL_ANY, | ||||
| CB_RECALL_SLOT, | ||||
| CB_SEQUENCE, | ||||
| CB_WANTS_CANCELLED, | ||||
| CLOSE, | ||||
| COMMIT, | ||||
| CREATE, | ||||
| CREATE_SESSION, | ||||
| DELEGPURGE, | ||||
| DELEGRETURN, | ||||
| DESTROY_CLIENTID, | ||||
| DESTROY_SESSION, | ||||
| EXCHANGE_ID, | ||||
| FREE_STATEID, | ||||
| GETATTR, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| GET_DIR_DELEGATION, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| OPEN_DOWNGRADE, | ||||
| PUTFH, | ||||
| PUTPUBFH, | ||||
| PUTROOTFH, | ||||
| READ, | ||||
| READDIR, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| REMOVE, | ||||
| RENAME, | ||||
| RESTOREFH, | ||||
| SAVEFH, | ||||
| SECINFO, | ||||
| SECINFO_NO_NAME, | ||||
| SEQUENCE, | ||||
| SETATTR, | ||||
| SET_SSV, | ||||
| TEST_STATEID, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_UNKNOWN_LAYOUTTYPE</td> | ||||
| <td align="left"> | ||||
| CB_LAYOUTRECALL, | ||||
| GETDEVICEINFO, | ||||
| GETDEVICELIST, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| NVERIFY, | ||||
| SETATTR, | ||||
| VERIFY | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_UNSAFE_COMPOUND</td> | ||||
| <td align="left"> | ||||
| CREATE, | ||||
| OPEN, | ||||
| OPENATTR | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_WRONGSEC</td> | ||||
| <td align="left"> | ||||
| LINK, | ||||
| LOOKUP, | ||||
| LOOKUPP, | ||||
| OPEN, | ||||
| PUTFH, | ||||
| PUTPUBFH, | ||||
| PUTROOTFH, | ||||
| RENAME, | ||||
| RESTOREFH | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_WRONG_CRED</td> | ||||
| <td align="left"> | ||||
| CLOSE, | ||||
| CREATE_SESSION, | ||||
| DELEGPURGE, | ||||
| DELEGRETURN, | ||||
| DESTROY_CLIENTID, | ||||
| DESTROY_SESSION, | ||||
| FREE_STATEID, | ||||
| LAYOUTCOMMIT, | ||||
| LAYOUTRETURN, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| LOCKU, | ||||
| OPEN_DOWNGRADE, | ||||
| RECLAIM_COMPLETE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_WRONG_TYPE</td> | ||||
| <td align="left"> | ||||
| CB_LAYOUTRECALL, | ||||
| CB_PUSH_DELEG, | ||||
| COMMIT, | ||||
| GETATTR, | ||||
| LAYOUTGET, | ||||
| LAYOUTRETURN, | ||||
| LINK, | ||||
| LOCK, | ||||
| LOCKT, | ||||
| NVERIFY, | ||||
| OPEN, | ||||
| OPENATTR, | ||||
| READ, | ||||
| READLINK, | ||||
| RECLAIM_COMPLETE, | ||||
| SETATTR, | ||||
| VERIFY, | ||||
| WANT_DELEGATION, | ||||
| WRITE | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_XDEV</td> | ||||
| <td align="left"> | ||||
| LINK, | ||||
| RENAME | ||||
| </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="nfsv41procedures" numbered="true" toc="default"> | ||||
| <name>NFSv4.1 Procedures</name> | ||||
| <t> | ||||
| Both procedures, NULL and COMPOUND, <bcp14>MUST</bcp14> be implemented. | ||||
| </t> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="PROC_NULL" numbered="true" toc="default"> | ||||
| <name>Procedure 0: NULL - No Operation</name> | ||||
| <section toc="exclude" anchor="PROC_NULL_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| void;]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="PROC_NULL_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| void;]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="PROC_NULL_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This is the standard NULL procedure with the standard void argument and | ||||
| void response. | ||||
| This procedure has no functionality associated with it. Because of | ||||
| this, it is sometimes used to measure the overhead of processing a | ||||
| service request. Therefore, the server <bcp14>SHOULD</bcp14> ensure that no | ||||
| unnecessary work is done in servicing this procedure. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="PROC_NULL_ERRORS" numbered="true"> | ||||
| <name>ERRORS</name> | ||||
| <t> | ||||
| None. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_COMPOUND" numbered="true" toc="default"> | ||||
| <name>Procedure 1: COMPOUND - Compound Operations</name> | ||||
| <section toc="exclude" anchor="OP_COMPOUND_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| enum nfs_opnum4 { | ||||
| OP_ACCESS = 3, | ||||
| OP_CLOSE = 4, | ||||
| OP_COMMIT = 5, | ||||
| OP_CREATE = 6, | ||||
| OP_DELEGPURGE = 7, | ||||
| OP_DELEGRETURN = 8, | ||||
| OP_GETATTR = 9, | ||||
| OP_GETFH = 10, | ||||
| OP_LINK = 11, | ||||
| OP_LOCK = 12, | ||||
| OP_LOCKT = 13, | ||||
| OP_LOCKU = 14, | ||||
| OP_LOOKUP = 15, | ||||
| OP_LOOKUPP = 16, | ||||
| OP_NVERIFY = 17, | ||||
| OP_OPEN = 18, | ||||
| OP_OPENATTR = 19, | ||||
| OP_OPEN_CONFIRM = 20, /* Mandatory not-to-implement */ | ||||
| OP_OPEN_DOWNGRADE = 21, | ||||
| OP_PUTFH = 22, | ||||
| OP_PUTPUBFH = 23, | ||||
| OP_PUTROOTFH = 24, | ||||
| OP_READ = 25, | ||||
| OP_READDIR = 26, | ||||
| OP_READLINK = 27, | ||||
| OP_REMOVE = 28, | ||||
| OP_RENAME = 29, | ||||
| OP_RENEW = 30, /* Mandatory not-to-implement */ | ||||
| OP_RESTOREFH = 31, | ||||
| OP_SAVEFH = 32, | ||||
| OP_SECINFO = 33, | ||||
| OP_SETATTR = 34, | ||||
| OP_SETCLIENTID = 35, /* Mandatory not-to-implement */ | ||||
| OP_SETCLIENTID_CONFIRM = 36, /* Mandatory not-to-implement */ | ||||
| OP_VERIFY = 37, | ||||
| OP_WRITE = 38, | ||||
| OP_RELEASE_LOCKOWNER = 39, /* Mandatory not-to-implement */ | ||||
| /* new operations for NFSv4.1 */ | ||||
| OP_BACKCHANNEL_CTL = 40, | ||||
| OP_BIND_CONN_TO_SESSION = 41, | ||||
| OP_EXCHANGE_ID = 42, | ||||
| OP_CREATE_SESSION = 43, | ||||
| OP_DESTROY_SESSION = 44, | ||||
| OP_FREE_STATEID = 45, | ||||
| OP_GET_DIR_DELEGATION = 46, | ||||
| OP_GETDEVICEINFO = 47, | ||||
| OP_GETDEVICELIST = 48, | ||||
| OP_LAYOUTCOMMIT = 49, | ||||
| OP_LAYOUTGET = 50, | ||||
| OP_LAYOUTRETURN = 51, | ||||
| OP_SECINFO_NO_NAME = 52, | ||||
| OP_SEQUENCE = 53, | ||||
| OP_SET_SSV = 54, | ||||
| OP_TEST_STATEID = 55, | ||||
| OP_WANT_DELEGATION = 56, | ||||
| OP_DESTROY_CLIENTID = 57, | ||||
| OP_RECLAIM_COMPLETE = 58, | ||||
| OP_ILLEGAL = 10044 | ||||
| }; | ||||
| union nfs_argop4 switch (nfs_opnum4 argop) { | ||||
| case OP_ACCESS: ACCESS4args opaccess; | ||||
| case OP_CLOSE: CLOSE4args opclose; | ||||
| case OP_COMMIT: COMMIT4args opcommit; | ||||
| case OP_CREATE: CREATE4args opcreate; | ||||
| case OP_DELEGPURGE: DELEGPURGE4args opdelegpurge; | ||||
| case OP_DELEGRETURN: DELEGRETURN4args opdelegreturn; | ||||
| case OP_GETATTR: GETATTR4args opgetattr; | ||||
| case OP_GETFH: void; | ||||
| case OP_LINK: LINK4args oplink; | ||||
| case OP_LOCK: LOCK4args oplock; | ||||
| case OP_LOCKT: LOCKT4args oplockt; | ||||
| case OP_LOCKU: LOCKU4args oplocku; | ||||
| case OP_LOOKUP: LOOKUP4args oplookup; | ||||
| case OP_LOOKUPP: void; | ||||
| case OP_NVERIFY: NVERIFY4args opnverify; | ||||
| case OP_OPEN: OPEN4args opopen; | ||||
| case OP_OPENATTR: OPENATTR4args opopenattr; | ||||
| /* Not for NFSv4.1 */ | ||||
| case OP_OPEN_CONFIRM: OPEN_CONFIRM4args opopen_confirm; | ||||
| case OP_OPEN_DOWNGRADE: | ||||
| OPEN_DOWNGRADE4args opopen_downgrade; | ||||
| case OP_PUTFH: PUTFH4args opputfh; | ||||
| case OP_PUTPUBFH: void; | ||||
| case OP_PUTROOTFH: void; | ||||
| case OP_READ: READ4args opread; | ||||
| case OP_READDIR: READDIR4args opreaddir; | ||||
| case OP_READLINK: void; | ||||
| case OP_REMOVE: REMOVE4args opremove; | ||||
| case OP_RENAME: RENAME4args oprename; | ||||
| /* Not for NFSv4.1 */ | ||||
| case OP_RENEW: RENEW4args oprenew; | ||||
| case OP_RESTOREFH: void; | ||||
| case OP_SAVEFH: void; | ||||
| case OP_SECINFO: SECINFO4args opsecinfo; | ||||
| case OP_SETATTR: SETATTR4args opsetattr; | ||||
| /* Not for NFSv4.1 */ | ||||
| case OP_SETCLIENTID: SETCLIENTID4args opsetclientid; | ||||
| /* Not for NFSv4.1 */ | ||||
| case OP_SETCLIENTID_CONFIRM: SETCLIENTID_CONFIRM4args | ||||
| opsetclientid_confirm; | ||||
| case OP_VERIFY: VERIFY4args opverify; | ||||
| case OP_WRITE: WRITE4args opwrite; | ||||
| /* Not for NFSv4.1 */ | ||||
| case OP_RELEASE_LOCKOWNER: | ||||
| RELEASE_LOCKOWNER4args | ||||
| oprelease_lockowner; | ||||
| /* Operations new to NFSv4.1 */ | ||||
| case OP_BACKCHANNEL_CTL: | ||||
| BACKCHANNEL_CTL4args opbackchannel_ctl; | ||||
| case OP_BIND_CONN_TO_SESSION: | ||||
| BIND_CONN_TO_SESSION4args | ||||
| opbind_conn_to_session; | ||||
| case OP_EXCHANGE_ID: EXCHANGE_ID4args opexchange_id; | ||||
| case OP_CREATE_SESSION: | ||||
| CREATE_SESSION4args opcreate_session; | ||||
| case OP_DESTROY_SESSION: | ||||
| DESTROY_SESSION4args opdestroy_session; | ||||
| case OP_FREE_STATEID: FREE_STATEID4args opfree_stateid; | ||||
| case OP_GET_DIR_DELEGATION: | ||||
| GET_DIR_DELEGATION4args | ||||
| opget_dir_delegation; | ||||
| case OP_GETDEVICEINFO: GETDEVICEINFO4args opgetdeviceinfo; | ||||
| case OP_GETDEVICELIST: GETDEVICELIST4args opgetdevicelist; | ||||
| case OP_LAYOUTCOMMIT: LAYOUTCOMMIT4args oplayoutcommit; | ||||
| case OP_LAYOUTGET: LAYOUTGET4args oplayoutget; | ||||
| case OP_LAYOUTRETURN: LAYOUTRETURN4args oplayoutreturn; | ||||
| case OP_SECINFO_NO_NAME: | ||||
| SECINFO_NO_NAME4args opsecinfo_no_name; | ||||
| case OP_SEQUENCE: SEQUENCE4args opsequence; | ||||
| case OP_SET_SSV: SET_SSV4args opset_ssv; | ||||
| case OP_TEST_STATEID: TEST_STATEID4args optest_stateid; | ||||
| case OP_WANT_DELEGATION: | ||||
| WANT_DELEGATION4args opwant_delegation; | ||||
| case OP_DESTROY_CLIENTID: | ||||
| DESTROY_CLIENTID4args | ||||
| opdestroy_clientid; | ||||
| case OP_RECLAIM_COMPLETE: | ||||
| RECLAIM_COMPLETE4args | ||||
| opreclaim_complete; | ||||
| /* Operations not new to NFSv4.1 */ | ||||
| case OP_ILLEGAL: void; | ||||
| }; | ||||
| struct COMPOUND4args { | ||||
| utf8str_cs tag; | ||||
| uint32_t minorversion; | ||||
| nfs_argop4 argarray<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_COMPOUND_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| union nfs_resop4 switch (nfs_opnum4 resop) { | ||||
| case OP_ACCESS: ACCESS4res opaccess; | ||||
| case OP_CLOSE: CLOSE4res opclose; | ||||
| case OP_COMMIT: COMMIT4res opcommit; | ||||
| case OP_CREATE: CREATE4res opcreate; | ||||
| case OP_DELEGPURGE: DELEGPURGE4res opdelegpurge; | ||||
| case OP_DELEGRETURN: DELEGRETURN4res opdelegreturn; | ||||
| case OP_GETATTR: GETATTR4res opgetattr; | ||||
| case OP_GETFH: GETFH4res opgetfh; | ||||
| case OP_LINK: LINK4res oplink; | ||||
| case OP_LOCK: LOCK4res oplock; | ||||
| case OP_LOCKT: LOCKT4res oplockt; | ||||
| case OP_LOCKU: LOCKU4res oplocku; | ||||
| case OP_LOOKUP: LOOKUP4res oplookup; | ||||
| case OP_LOOKUPP: LOOKUPP4res oplookupp; | ||||
| case OP_NVERIFY: NVERIFY4res opnverify; | ||||
| case OP_OPEN: OPEN4res opopen; | ||||
| case OP_OPENATTR: OPENATTR4res opopenattr; | ||||
| /* Not for NFSv4.1 */ | ||||
| case OP_OPEN_CONFIRM: OPEN_CONFIRM4res opopen_confirm; | ||||
| case OP_OPEN_DOWNGRADE: | ||||
| OPEN_DOWNGRADE4res | ||||
| opopen_downgrade; | ||||
| case OP_PUTFH: PUTFH4res opputfh; | ||||
| case OP_PUTPUBFH: PUTPUBFH4res opputpubfh; | ||||
| case OP_PUTROOTFH: PUTROOTFH4res opputrootfh; | ||||
| case OP_READ: READ4res opread; | ||||
| case OP_READDIR: READDIR4res opreaddir; | ||||
| case OP_READLINK: READLINK4res opreadlink; | ||||
| case OP_REMOVE: REMOVE4res opremove; | ||||
| case OP_RENAME: RENAME4res oprename; | ||||
| /* Not for NFSv4.1 */ | ||||
| case OP_RENEW: RENEW4res oprenew; | ||||
| case OP_RESTOREFH: RESTOREFH4res oprestorefh; | ||||
| case OP_SAVEFH: SAVEFH4res opsavefh; | ||||
| case OP_SECINFO: SECINFO4res opsecinfo; | ||||
| case OP_SETATTR: SETATTR4res opsetattr; | ||||
| /* Not for NFSv4.1 */ | ||||
| case OP_SETCLIENTID: SETCLIENTID4res opsetclientid; | ||||
| /* Not for NFSv4.1 */ | ||||
| case OP_SETCLIENTID_CONFIRM: | ||||
| SETCLIENTID_CONFIRM4res | ||||
| opsetclientid_confirm; | ||||
| case OP_VERIFY: VERIFY4res opverify; | ||||
| case OP_WRITE: WRITE4res opwrite; | ||||
| /* Not for NFSv4.1 */ | ||||
| case OP_RELEASE_LOCKOWNER: | ||||
| RELEASE_LOCKOWNER4res | ||||
| oprelease_lockowner; | ||||
| /* Operations new to NFSv4.1 */ | ||||
| case OP_BACKCHANNEL_CTL: | ||||
| BACKCHANNEL_CTL4res | ||||
| opbackchannel_ctl; | ||||
| case OP_BIND_CONN_TO_SESSION: | ||||
| BIND_CONN_TO_SESSION4res | ||||
| opbind_conn_to_session; | ||||
| case OP_EXCHANGE_ID: EXCHANGE_ID4res opexchange_id; | ||||
| case OP_CREATE_SESSION: | ||||
| CREATE_SESSION4res | ||||
| opcreate_session; | ||||
| case OP_DESTROY_SESSION: | ||||
| DESTROY_SESSION4res | ||||
| opdestroy_session; | ||||
| case OP_FREE_STATEID: FREE_STATEID4res | ||||
| opfree_stateid; | ||||
| case OP_GET_DIR_DELEGATION: | ||||
| GET_DIR_DELEGATION4res | ||||
| opget_dir_delegation; | ||||
| case OP_GETDEVICEINFO: GETDEVICEINFO4res | ||||
| opgetdeviceinfo; | ||||
| case OP_GETDEVICELIST: GETDEVICELIST4res | ||||
| opgetdevicelist; | ||||
| case OP_LAYOUTCOMMIT: LAYOUTCOMMIT4res oplayoutcommit; | ||||
| case OP_LAYOUTGET: LAYOUTGET4res oplayoutget; | ||||
| case OP_LAYOUTRETURN: LAYOUTRETURN4res oplayoutreturn; | ||||
| case OP_SECINFO_NO_NAME: | ||||
| SECINFO_NO_NAME4res | ||||
| opsecinfo_no_name; | ||||
| case OP_SEQUENCE: SEQUENCE4res opsequence; | ||||
| case OP_SET_SSV: SET_SSV4res opset_ssv; | ||||
| case OP_TEST_STATEID: TEST_STATEID4res optest_stateid; | ||||
| case OP_WANT_DELEGATION: | ||||
| WANT_DELEGATION4res | ||||
| opwant_delegation; | ||||
| case OP_DESTROY_CLIENTID: | ||||
| DESTROY_CLIENTID4res | ||||
| opdestroy_clientid; | ||||
| case OP_RECLAIM_COMPLETE: | ||||
| RECLAIM_COMPLETE4res | ||||
| opreclaim_complete; | ||||
| /* Operations not new to NFSv4.1 */ | ||||
| case OP_ILLEGAL: ILLEGAL4res opillegal; | ||||
| }; | ||||
| struct COMPOUND4res { | ||||
| nfsstat4 status; | ||||
| utf8str_cs tag; | ||||
| nfs_resop4 resarray<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_COMPOUND_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The COMPOUND procedure is used to combine one or more NFSv4 | ||||
| operations into a | ||||
| single RPC request. The server interprets each of the operations in | ||||
| turn. If an operation is executed by the server and the status of that | ||||
| operation is NFS4_OK, then the next operation in the COMPOUND | ||||
| procedure is executed. The server continues this process until there | ||||
| are no more operations to be executed or until one of the operations has a | ||||
| status value other than NFS4_OK. | ||||
| </t> | ||||
| <t> | ||||
| In the processing of the COMPOUND procedure, the server may find that | ||||
| it does not have the available resources to execute any or all of the | ||||
| operations within the COMPOUND sequence. See | ||||
| <xref target="COMPOUND_Sizing_Issues" format="default"/> for a more detailed discussion. | ||||
| </t> | ||||
| <t> | ||||
| The server will generally choose between two methods of decoding the | ||||
| client's request. The first would be the traditional one-pass XDR | ||||
| decode. If there is an XDR decoding error in this case, the RPC XDR | ||||
| decode error would be returned. The second method would be to make an | ||||
| initial pass to decode the basic COMPOUND request and then to XDR | ||||
| decode the individual operations; the most interesting is the decode | ||||
| of attributes. In this case, the server may encounter an XDR decode | ||||
| error during the second pass. If it does, the server would return | ||||
| the error NFS4ERR_BADXDR to signify the decode error. | ||||
| </t> | ||||
| <t> | ||||
| The COMPOUND arguments contain a "minorversion" field. For NFSv4.1, | ||||
| the value for this field is 1. If the server receives | ||||
| a COMPOUND procedure with a minorversion field value that it does not | ||||
| support, the server <bcp14>MUST</bcp14> return an error of | ||||
| NFS4ERR_MINOR_VERS_MISMATCH and a zero-length resultdata array. | ||||
| </t> | ||||
| <t> | ||||
| Contained within the COMPOUND results is a "status" field. If the | ||||
| results array length is non-zero, this status must be equivalent to | ||||
| the status of the last operation that was executed within the COMPOUND | ||||
| procedure. Therefore, if an operation incurred an error then the | ||||
| "status" value will be the same error value as is being returned for | ||||
| the operation that failed. | ||||
| </t> | ||||
| <t> | ||||
| Note that operations zero and one are not defined for the | ||||
| COMPOUND procedure. Operation 2 is not defined and is reserved for | ||||
| future definition and use with minor versioning. If the server | ||||
| receives an operation array that contains operation 2 and the | ||||
| minorversion field has a value of zero, an error of | ||||
| NFS4ERR_OP_ILLEGAL, as described in the next paragraph, is returned to | ||||
| the client. If an operation array contains an operation 2 and the | ||||
| minorversion field is non-zero and the server does not support the | ||||
| minor version, the server returns an error of | ||||
| NFS4ERR_MINOR_VERS_MISMATCH. Therefore, the | ||||
| NFS4ERR_MINOR_VERS_MISMATCH error takes precedence over all other | ||||
| errors. | ||||
| </t> | ||||
| <t> | ||||
| It is possible that the server receives a request that contains an | ||||
| operation that is less than the first legal operation (OP_ACCESS) or | ||||
| greater than the last legal operation (OP_RELEASE_LOCKOWNER). In this | ||||
| case, the server's response will encode the opcode OP_ILLEGAL rather | ||||
| than the illegal opcode of the request. The status field in the | ||||
| ILLEGAL return results will be set to NFS4ERR_OP_ILLEGAL. The COMPOUND | ||||
| procedure's return results will also be NFS4ERR_OP_ILLEGAL. | ||||
| </t> | ||||
| <t> | ||||
| The definition of the "tag" in the request is left to the implementor. | ||||
| It may be used to summarize the content of the Compound request for | ||||
| the benefit of packet-sniffers and engineers debugging | ||||
| implementations. However, the value of "tag" in the response <bcp14>SHOULD</bcp14> | ||||
| be the same value as provided in the request. This applies to the tag | ||||
| field of the CB_COMPOUND procedure as well. | ||||
| </t> | ||||
| <section toc="exclude" anchor="current_filehandle_stateid" numbered="true"> | ||||
| <name>Current Filehandle and Stateid</name> | ||||
| <t> | ||||
| The COMPOUND procedure offers a simple environment for the | ||||
| execution of the operations specified by the client. The first | ||||
| two relate to the filehandle while the second two relate to the | ||||
| current stateid. | ||||
| </t> | ||||
| <section toc="exclude" anchor="current_filehandle" numbered="true"> | ||||
| <name>Current Filehandle</name> | ||||
| <t> | ||||
| The current and saved filehandles are used throughout | ||||
| the protocol. Most operations implicitly use | ||||
| the current filehandle as an argument, and many set | ||||
| the current filehandle as part of the results. | ||||
| The combination of client-specified sequences | ||||
| of operations and current and saved filehandle | ||||
| arguments and results allows for greater protocol | ||||
| flexibility. The best or easiest example of current | ||||
| filehandle usage is a sequence like the following: | ||||
| </t> | ||||
| <figure anchor="curfh_example"> | ||||
| <sourcecode type="nfsv4compound"><![CDATA[ | ||||
| PUTFH fh1 {fh1} | ||||
| LOOKUP "compA" {fh2} | ||||
| GETATTR {fh2} | ||||
| LOOKUP "compB" {fh3} | ||||
| GETATTR {fh3} | ||||
| LOOKUP "compC" {fh4} | ||||
| GETATTR {fh4} | ||||
| GETFH]]></sourcecode> | ||||
| </figure> | ||||
| <t> | ||||
| In this example, the PUTFH (<xref target="OP_PUTFH" format="default"/>) operation explicitly sets the current | ||||
| filehandle value while the result of each LOOKUP operation sets | ||||
| the current filehandle value to the resultant file system | ||||
| object. Also, the client is able to insert GETATTR operations | ||||
| using the current filehandle as an argument. | ||||
| </t> | ||||
| <t> | ||||
| The PUTROOTFH (<xref target="OP_PUTROOTFH" format="default"/>) and | ||||
| PUTPUBFH (<xref target="OP_PUTPUBFH" format="default"/>) operations also set the | ||||
| current filehandle. The above example would replace "PUTFH fh1" with | ||||
| PUTROOTFH or PUTPUBFH with no filehandle argument in order to | ||||
| achieve the same effect (on the assumption that "compA" is directly | ||||
| below the root of the namespace). | ||||
| </t> | ||||
| <t> | ||||
| Along with the current filehandle, there is a saved filehandle. | ||||
| While the current filehandle is set as the result of | ||||
| operations like LOOKUP, the saved filehandle must be set | ||||
| directly with the use of the SAVEFH operation. The SAVEFH | ||||
| operation copies the current filehandle value to the saved | ||||
| value. The saved filehandle value is used in combination with | ||||
| the current filehandle value for the LINK and RENAME | ||||
| operations. The RESTOREFH operation will copy the saved filehandle value to the current filehandle value; as a result, the | ||||
| saved filehandle value may be used a sort of "scratch" area for | ||||
| the client's series of operations. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="current_stateid" numbered="true"> | ||||
| <name>Current Stateid</name> | ||||
| <t> | ||||
| With NFSv4.1, additions of a current stateid and a saved stateid | ||||
| have been made to the COMPOUND processing environment; this | ||||
| allows for the passing of stateids between operations. There | ||||
| are no changes to the syntax of the protocol, only changes to | ||||
| the semantics of a few operations. | ||||
| </t> | ||||
| <t> | ||||
| A "current stateid" is the stateid that is associated | ||||
| with the current filehandle. The current stateid | ||||
| may only be changed by an operation that modifies | ||||
| the current filehandle or returns a stateid. If an | ||||
| operation returns a stateid, it <bcp14>MUST</bcp14> set the current | ||||
| stateid to the returned value. If an operation sets | ||||
| the current filehandle but does not return a stateid, | ||||
| the current stateid <bcp14>MUST</bcp14> be set to the all-zeros | ||||
| special stateid, i.e., (seqid, other) = (0, 0). | ||||
| If an operation uses a stateid as an argument but does | ||||
| not return a stateid, the current stateid <bcp14>MUST NOT</bcp14> be | ||||
| changed. | ||||
| For example, PUTFH, PUTROOTFH, and PUTPUBFH | ||||
| will change the current server state from {ocfh, | ||||
| (osid)} to {cfh, (0, 0)}, while LOCK will change the current | ||||
| state from {cfh, (osid} to {cfh, (nsid)}. Operations like | ||||
| LOOKUP that transform a current filehandle and | ||||
| component name into a new current filehandle will also | ||||
| change the current state to {0, 0}. The SAVEFH | ||||
| and RESTOREFH operations will save and restore both | ||||
| the current filehandle and the current stateid as a set. | ||||
| </t> | ||||
| <t> | ||||
| The following example is the common case of a simple READ | ||||
| operation with a normal stateid showing that the PUTFH | ||||
| initializes the current stateid to (0, 0). The subsequent READ | ||||
| with stateid (sid1) leaves the current stateid unchanged. | ||||
| </t> | ||||
| <figure anchor="csid_example1"> | ||||
| <sourcecode type="nfsv4compound"><![CDATA[ | ||||
| PUTFH fh1 - -> {fh1, (0, 0)} | ||||
| READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)}]]></sourcecode> | ||||
| </figure> | ||||
| <t> | ||||
| This next example performs an OPEN with the root | ||||
| filehandle and, as a result, generates stateid (sid1). The next | ||||
| operation specifies the READ with the argument stateid set such | ||||
| that (seqid, other) are equal to (1, 0), | ||||
| but the current stateid set by the previous operation is | ||||
| actually used when the operation is evaluated. This allows correct | ||||
| interaction with any existing, potentially conflicting, | ||||
| locks. | ||||
| </t> | ||||
| <figure anchor="csid_example2"> | ||||
| <sourcecode type="nfsv4compound"><![CDATA[ | ||||
| PUTROOTFH - -> {fh1, (0, 0)} | ||||
| OPEN "compA" {fh1, (0, 0)} -> {fh2, (sid1)} | ||||
| READ (1, 0), 0, 1024 {fh2, (sid1)} -> {fh2, (sid1)} | ||||
| CLOSE (1, 0) {fh2, (sid1)} -> {fh2, (sid2)}]]></sourcecode> | ||||
| </figure> | ||||
| <t> | ||||
| This next example is similar to the second in how | ||||
| it passes the stateid sid2 generated by the LOCK | ||||
| operation to the next READ operation. This allows | ||||
| the client to explicitly surround a single I/O | ||||
| operation with a lock and its appropriate stateid to | ||||
| guarantee correctness with other client locks. The | ||||
| example also shows how SAVEFH and RESTOREFH can | ||||
| save and later reuse a filehandle and stateid, passing them as the | ||||
| current filehandle and stateid to a READ operation. | ||||
| </t> | ||||
| <figure anchor="csid_example3"> | ||||
| <sourcecode type="nfsv4compound"><![CDATA[ | ||||
| PUTFH fh1 - -> {fh1, (0, 0)} | ||||
| LOCK 0, 1024, (sid1) {fh1, (sid1)} -> {fh1, (sid2)} | ||||
| READ (1, 0), 0, 1024 {fh1, (sid2)} -> {fh1, (sid2)} | ||||
| LOCKU 0, 1024, (1, 0) {fh1, (sid2)} -> {fh1, (sid3)} | ||||
| SAVEFH {fh1, (sid3)} -> {fh1, (sid3)} | ||||
| PUTFH fh2 {fh1, (sid3)} -> {fh2, (0, 0)} | ||||
| WRITE (1, 0), 0, 1024 {fh2, (0, 0)} -> {fh2, (0, 0)} | ||||
| RESTOREFH {fh2, (0, 0)} -> {fh1, (sid3)} | ||||
| READ (1, 0), 1024, 1024 {fh1, (sid3)} -> {fh1, (sid3)}]]></sourcecode> | ||||
| </figure> | ||||
| <t> | ||||
| The final example shows a disallowed use of | ||||
| the current stateid. The client is attempting | ||||
| to implicitly pass an anonymous special stateid, (0,0), to | ||||
| the READ operation. The server <bcp14>MUST</bcp14> return NFS4ERR_BAD_STATEID | ||||
| in the reply to the READ operation. | ||||
| </t> | ||||
| <figure anchor="csid_example4"> | ||||
| <sourcecode type="nfsv4compound"><![CDATA[ | ||||
| PUTFH fh1 - -> {fh1, (0, 0)} | ||||
| READ (1, 0), 0, 1024 {fh1, (0, 0)} -> NFS4ERR_BAD_STATEID]]></sourcecode> | ||||
| </figure> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_COMPOUND_ERRORS" numbered="true"> | ||||
| <name>ERRORS</name> | ||||
| <t> | ||||
| COMPOUND will of course return every error that each operation on | ||||
| the fore channel can return (see <xref target="op_error_returns" format="default"/>). | ||||
| However, if COMPOUND returns zero operations, obviously the error | ||||
| returned by COMPOUND has nothing to do with an error returned by | ||||
| an operation. The list of errors COMPOUND will return if it processes | ||||
| zero operations include: | ||||
| </t> | ||||
| <table anchor="compounderrs" align="center"> | ||||
| <name>COMPOUND Error Returns</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Error</th> | ||||
| <th align="left">Notes</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADCHAR</td> | ||||
| <td align="left">The tag argument has a character the replier | ||||
| does not support. </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADXDR</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DELAY</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_INVAL</td> | ||||
| <td align="left">The tag argument is not in UTF-8 encoding.</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_MINOR_VERS_MISMATCH</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SERVERFAULT</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_TOO_MANY_OPS</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REP_TOO_BIG</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REP_TOO_BIG_TO_CACHE</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REQ_TOO_BIG</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="operation_mandlist" numbered="true" toc="default"> | ||||
| <name>Operations: <bcp14>REQUIRED</bcp14>, <bcp14>RECOMMENDED</bcp14>, or <bcp14>OPTIONAL</bcp14></name> | ||||
| <t> | ||||
| The following tables summarize the operations of the NFSv4.1 | ||||
| protocol and the corresponding designation of <bcp14>REQUIRED</bcp14>, | ||||
| <bcp14>RECOMMENDED</bcp14>, and <bcp14>OPTIONAL</bcp14> to implement or <bcp14>MUST NOT</bcp14> implement. The | ||||
| designation of <bcp14>MUST NOT</bcp14> implement is reserved for those operations | ||||
| that were defined in NFSv4.0 and <bcp14>MUST NOT</bcp14> be implemented in NFSv4.1. | ||||
| </t> | ||||
| <t> | ||||
| For the most part, the <bcp14>REQUIRED</bcp14>, <bcp14>RECOMMENDED</bcp14>, or <bcp14>OPTIONAL</bcp14> designation for | ||||
| operations sent by the client is for | ||||
| the server implementation. The client is generally required to | ||||
| implement the operations needed for the operating environment for | ||||
| which it serves. For example, a read-only NFSv4.1 client would | ||||
| have no need to implement the WRITE operation and is not required | ||||
| to do so. | ||||
| </t> | ||||
| <t> | ||||
| The <bcp14>REQUIRED</bcp14> or <bcp14>OPTIONAL</bcp14> designation for | ||||
| callback operations sent by the server is for both the client | ||||
| and server. Generally, the client has the option of | ||||
| creating the backchannel and sending the operations on the | ||||
| fore channel that will be a catalyst for the server sending | ||||
| callback operations. A partial | ||||
| exception is CB_RECALL_SLOT; the only way the client can | ||||
| avoid supporting this operation is by not creating a backchannel. | ||||
| </t> | ||||
| <t> | ||||
| Since this is a summary of the operations and their designation, | ||||
| there are subtleties that are not presented here. Therefore, if | ||||
| there is a question of the requirements of implementation, the | ||||
| operation descriptions themselves must be consulted along with | ||||
| other relevant explanatory text within this specification. | ||||
| </t> | ||||
| <t> | ||||
| The abbreviations used in the second and third columns of the table | ||||
| are defined as follows. | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>REQ</dt> | ||||
| <dd> | ||||
| <bcp14>REQUIRED</bcp14> to implement | ||||
| </dd> | ||||
| <dt>REC</dt> | ||||
| <dd> | ||||
| RECOMMEND to implement | ||||
| </dd> | ||||
| <dt>OPT</dt> | ||||
| <dd> | ||||
| <bcp14>OPTIONAL</bcp14> to implement | ||||
| </dd> | ||||
| <dt>MNI</dt> | ||||
| <dd> | ||||
| <bcp14>MUST NOT</bcp14> implement | ||||
| </dd> | ||||
| </dl> | ||||
| <t> For the NFSv4.1 features that are <bcp14>OPTIONAL</bcp14>, the operations that | ||||
| support those features are <bcp14>OPTIONAL</bcp14>, and the server would return | ||||
| NFS4ERR_NOTSUPP in response to the client's use of those | ||||
| operations. If an <bcp14>OPTIONAL</bcp14> feature is supported, it is possible | ||||
| that a set of operations related to the feature become <bcp14>REQUIRED</bcp14> | ||||
| to implement. The third column of the table designates the | ||||
| feature(s) and if the operation is <bcp14>REQUIRED</bcp14> or <bcp14>OPTIONAL</bcp14> in the | ||||
| presence of support for the feature. | ||||
| </t> | ||||
| <t> | ||||
| The <bcp14>OPTIONAL</bcp14> features identified and their abbreviations are as | ||||
| follows: | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>pNFS</dt> | ||||
| <dd> | ||||
| Parallel NFS | ||||
| </dd> | ||||
| <dt>FDELG</dt> | ||||
| <dd> | ||||
| File Delegations | ||||
| </dd> | ||||
| <dt>DDELG</dt> | ||||
| <dd> | ||||
| Directory Delegations | ||||
| </dd> | ||||
| </dl> | ||||
| <table align="center"> | ||||
| <name>Operations</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Operation</th> | ||||
| <th align="left">REQ, REC, OPT, or MNI</th> | ||||
| <th align="left">Feature (REQ, REC, or OPT)</th> | ||||
| <th align="left">Definition</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left"> ACCESS </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_ACCESS" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> BACKCHANNEL_CTL </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_BACKCHANNEL_CTL" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> BIND_CONN_TO_SESSION</td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_BIND_CONN_TO_SESSION" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CLOSE </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_CLOSE" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> COMMIT </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_COMMIT" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CREATE </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_CREATE" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CREATE_SESSION </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_CREATE_SESSION" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> DELEGPURGE </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">FDELG (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_DELEGPURGE" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> DELEGRETURN </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">FDELG, DDELG, pNFS (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_DELEGRETURN" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> DESTROY_CLIENTID </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_DESTROY_CLIENTID" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> DESTROY_SESSION </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_DESTROY_SESSION" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> EXCHANGE_ID </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_EXCHANGE_ID" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> FREE_STATEID </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_FREE_STATEID" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> GETATTR </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_GETATTR" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> GETDEVICEINFO </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">pNFS (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_GETDEVICEINFO" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> GETDEVICELIST</td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">pNFS (OPT)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_GETDEVICELIST" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> GETFH </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_GETFH" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> GET_DIR_DELEGATION </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">DDELG (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_GET_DIR_DELEGATION" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> LAYOUTCOMMIT </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">pNFS (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_LAYOUTCOMMIT" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> LAYOUTGET </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">pNFS (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_LAYOUTGET" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> LAYOUTRETURN </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">pNFS (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_LAYOUTRETURN" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> LINK </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_LINK" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> LOCK </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_LOCK" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> LOCKT </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_LOCKT" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> LOCKU </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_LOCKU" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> LOOKUP </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_LOOKUP" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> LOOKUPP </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_LOOKUPP" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> NVERIFY </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_NVERIFY" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> OPEN </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_OPEN" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> OPENATTR </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_OPENATTR" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> OPEN_CONFIRM </td> | ||||
| <td align="left">MNI</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> N/A </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> OPEN_DOWNGRADE </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_OPEN_DOWNGRADE" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> PUTFH </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_PUTFH" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> PUTPUBFH </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_PUTPUBFH" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> PUTROOTFH </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_PUTROOTFH" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> READ </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_READ" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> READDIR </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_READDIR" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> READLINK </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_READLINK" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> RECLAIM_COMPLETE </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_RECLAIM_COMPLETE" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> RELEASE_LOCKOWNER</td> | ||||
| <td align="left">MNI</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> N/A </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> REMOVE </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_REMOVE" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> RENAME </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_RENAME" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> RENEW </td> | ||||
| <td align="left">MNI</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> N/A </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> RESTOREFH </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_RESTOREFH" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> SAVEFH </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_SAVEFH" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> SECINFO </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_SECINFO" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> SECINFO_NO_NAME </td> | ||||
| <td align="left">REC</td> | ||||
| <td align="left">pNFS file layout (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_SECINFO_NO_NAME" format="default"/>, | ||||
| <xref target="file_security_considerations" format="default"/> | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> SEQUENCE </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_SEQUENCE" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> SETATTR </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_SETATTR" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> SETCLIENTID</td> | ||||
| <td align="left">MNI</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> N/A </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> SETCLIENTID_CONFIRM</td> | ||||
| <td align="left">MNI</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> N/A </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> SET_SSV</td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_SET_SSV" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> TEST_STATEID </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_TEST_STATEID" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> VERIFY </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_VERIFY" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> WANT_DELEGATION</td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">FDELG (OPT)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_WANT_DELEGATION" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> WRITE </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_WRITE" format="default"/> </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <table align="center"> | ||||
| <name>Callback Operations</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Operation</th> | ||||
| <th align="left">REQ, REC, OPT, or MNI</th> | ||||
| <th align="left">Feature (REQ, REC, or OPT)</th> | ||||
| <th align="left">Definition</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left"> CB_GETATTR </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">FDELG (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_CB_GETATTR" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CB_LAYOUTRECALL </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">pNFS (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_CB_LAYOUTRECALL" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CB_NOTIFY </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">DDELG (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_CB_NOTIFY" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CB_NOTIFY_DEVICEID </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">pNFS (OPT)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_CB_NOTIFY_DEVICEID" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CB_NOTIFY_LOCK </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_CB_NOTIFY_LOCK" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CB_PUSH_DELEG </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">FDELG (OPT)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_CB_PUSH_DELEG" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CB_RECALL </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">FDELG, DDELG, pNFS (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_CB_RECALL" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CB_RECALL_ANY </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">FDELG, DDELG, pNFS (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_CB_RECALL_ANY" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CB_RECALL_SLOT </td> | ||||
| <td align="left">REQ</td> | ||||
| <td align="left"/> | ||||
| <td align="left"> | ||||
| <xref target="OP_CB_RECALL_SLOT" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CB_RECALLABLE_OBJ_AVAIL </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">DDELG, pNFS (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_CB_RECALLABLE_OBJ_AVAIL" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CB_SEQUENCE </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">FDELG, DDELG, pNFS (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_CB_SEQUENCE" format="default"/> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> CB_WANTS_CANCELLED </td> | ||||
| <td align="left">OPT</td> | ||||
| <td align="left">FDELG, DDELG, pNFS (REQ)</td> | ||||
| <td align="left"> | ||||
| <xref target="OP_CB_WANTS_CANCELLED" format="default"/> </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="nfsv41operations" numbered="true" toc="default"> | ||||
| <name>NFSv4.1 Operations</name> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_ACCESS" numbered="true" toc="default"> | ||||
| <name>Operation 3: ACCESS - Check Access Rights</name> | ||||
| <section toc="exclude" anchor="OP_ACCESS_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const ACCESS4_READ = 0x00000001; | ||||
| const ACCESS4_LOOKUP = 0x00000002; | ||||
| const ACCESS4_MODIFY = 0x00000004; | ||||
| const ACCESS4_EXTEND = 0x00000008; | ||||
| const ACCESS4_DELETE = 0x00000010; | ||||
| const ACCESS4_EXECUTE = 0x00000020; | ||||
| struct ACCESS4args { | ||||
| /* CURRENT_FH: object */ | ||||
| uint32_t access; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_ACCESS_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct ACCESS4resok { | ||||
| uint32_t supported; | ||||
| uint32_t access; | ||||
| }; | ||||
| union ACCESS4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| ACCESS4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_ACCESS_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| ACCESS determines the access rights that a user, as identified by the | ||||
| credentials in the RPC request, has with respect to the file system | ||||
| object specified by the current filehandle. The client encodes the | ||||
| set of access rights that are to be checked in the bit mask "access". | ||||
| The server checks the permissions encoded in the bit mask. If a | ||||
| status of NFS4_OK is returned, two bit masks are included in the | ||||
| response. The first, "supported", represents the access rights for | ||||
| which the server can verify reliably. The second, "access", | ||||
| represents the access rights available to the user for the filehandle | ||||
| provided. On success, the current filehandle retains its value. | ||||
| </t> | ||||
| <t> | ||||
| Note that the reply's supported and access fields <bcp14>MUST NOT</bcp14> | ||||
| contain more values than originally set in the request's | ||||
| access field. For example, if the client sends an ACCESS | ||||
| operation with just the ACCESS4_READ value set and the | ||||
| server supports this value, the server <bcp14>MUST NOT</bcp14> set more | ||||
| than ACCESS4_READ in the supported field even if it could | ||||
| have reliably checked other values. | ||||
| </t> | ||||
| <t> | ||||
| The reply's access field <bcp14>MUST NOT</bcp14> contain more values than the | ||||
| supported field. | ||||
| </t> | ||||
| <t> | ||||
| The results of this operation are necessarily advisory in nature. A | ||||
| return status of NFS4_OK and the appropriate bit set in the bit mask | ||||
| do not imply that such access will be allowed to the file system | ||||
| object in the future. This is because access rights can be revoked by | ||||
| the server at any time. | ||||
| </t> | ||||
| <t> | ||||
| The following access permissions may be requested: | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>ACCESS4_READ</dt> | ||||
| <dd> | ||||
| Read data from file or read a directory. | ||||
| </dd> | ||||
| <dt>ACCESS4_LOOKUP</dt> | ||||
| <dd> | ||||
| Look up a name in a directory (no meaning for non-directory objects). | ||||
| </dd> | ||||
| <dt>ACCESS4_MODIFY</dt> | ||||
| <dd> | ||||
| Rewrite existing file data or modify existing directory entries. | ||||
| </dd> | ||||
| <dt>ACCESS4_EXTEND</dt> | ||||
| <dd> | ||||
| Write new data or add directory entries. | ||||
| </dd> | ||||
| <dt>ACCESS4_DELETE</dt> | ||||
| <dd> | ||||
| Delete an existing directory entry. | ||||
| </dd> | ||||
| <dt>ACCESS4_EXECUTE</dt> | ||||
| <dd> | ||||
| Execute a regular file (no meaning for a directory). | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| <t> | ||||
| ACCESS4_EXECUTE is a challenging semantic to implement because | ||||
| NFS provides remote file access, not remote | ||||
| execution. This leads to the following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Whether or not a regular file is executable ought to be | ||||
| the responsibility of the NFS client and not the server. And yet | ||||
| the ACCESS operation is specified to seemingly require a server to | ||||
| own that responsibility. | ||||
| </li> | ||||
| <li> | ||||
| When a client executes a regular file, it has to | ||||
| read the file from the server. Strictly speaking, | ||||
| the server should not allow the client to read a file | ||||
| being executed unless the user has read permissions | ||||
| on the file. Requiring | ||||
| explicit read permissions on executable files in order to | ||||
| access them over NFS is not going to be acceptable to | ||||
| some users and storage administrators. Historically, NFS servers have allowed | ||||
| a user to READ a file if the user has execute access | ||||
| to the file. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| As a practical example, the UNIX specification <xref target="access_api" format="default"/> states that an implementation | ||||
| claiming conformance to UNIX may indicate in the | ||||
| access() programming interface's result that a | ||||
| privileged user has execute rights, even if no | ||||
| execute permission bits are set on the regular file's | ||||
| attributes. It is possible to claim conformance | ||||
| to the UNIX specification and instead not indicate | ||||
| execute rights in that situation, which is true for | ||||
| some operating environments. Suppose the operating | ||||
| environments of the client and server are implementing | ||||
| the access() semantics for privileged users differently, | ||||
| and the ACCESS operation implementations of the client | ||||
| and server follow their respective access() semantics. | ||||
| This can cause undesired behavior: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Suppose the client's access() interface returns X_OK | ||||
| if the user is privileged and no execute permission | ||||
| bits are set on the regular file's attribute, and the | ||||
| server's access() interface does not return X_OK in | ||||
| that situation. Then the client will be unable to | ||||
| execute files stored on the NFS server that could be | ||||
| executed if stored on a non-NFS file system. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Suppose the client's access() interface does | ||||
| not return X_OK if the user is privileged, and no | ||||
| execute permission bits are set on the regular file's | ||||
| attribute, and the server's access() interface does | ||||
| return X_OK in that situation. Then: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The client will be able to execute files stored on | ||||
| the NFS server that could be executed if stored on | ||||
| a non-NFS file system, unless the client's execution | ||||
| subsystem also checks for execute permission bits. | ||||
| </li> | ||||
| <li> | ||||
| Even if the execution subsystem is checking for | ||||
| execute permission bits, there are more potential | ||||
| issues. For example, suppose the client is invoking access() | ||||
| to build a "path search table" of all executable | ||||
| files in the user's "search path", where the path | ||||
| is a list of directories each containing executable | ||||
| files. Suppose there are two files each in separate | ||||
| directories of the search path, such that files have | ||||
| the same component name. In the first directory | ||||
| the file has no execute permission bits set, | ||||
| and in the second directory the file has execute | ||||
| bits set. The path search table will indicate that | ||||
| the first directory has the executable file, but | ||||
| the execute subsystem will fail to execute it. The | ||||
| command shell might fail to try the second file in | ||||
| the second directory. And even if it did, this is | ||||
| a potential performance issue. Clearly, the desired | ||||
| outcome for the client is for the path search table | ||||
| to not contain the first file. | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| To deal with the problems described above, the "smart client, | ||||
| stupid server" principle is used. The client owns overall | ||||
| responsibility for determining execute access and | ||||
| relies on the server to parse the execution permissions | ||||
| within the file's mode, acl, and dacl attributes. The | ||||
| rules for the client and server follow: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the client is sending ACCESS in order to determine | ||||
| if the user can read the file, the client <bcp14>SHOULD</bcp14> | ||||
| set ACCESS4_READ in the request's access field. | ||||
| </li> | ||||
| <li> | ||||
| If the client's operating environment only grants | ||||
| execution to the user if the user has execute access | ||||
| according to the execute permissions in the mode, | ||||
| acl, and dacl attributes, then if the client wants | ||||
| to determine execute access, the client <bcp14>SHOULD</bcp14> send | ||||
| an ACCESS request with ACCESS4_EXECUTE bit set in the | ||||
| request's access field. | ||||
| </li> | ||||
| <li> | ||||
| If the client's operating environment grants execution | ||||
| to the user even if the user does not have execute | ||||
| access according to the execute permissions in the | ||||
| mode, acl, and dacl attributes, then if the client | ||||
| wants to determine execute access, it <bcp14>SHOULD</bcp14> send | ||||
| an ACCESS request with both the ACCESS4_EXECUTE and | ||||
| ACCESS4_READ bits set in the request's access field. This | ||||
| way, if any read or execute permission grants the user | ||||
| read or execute access (or if the server interprets | ||||
| the user as privileged), as indicated by the presence | ||||
| of ACCESS4_EXECUTE and/or ACCESS4_READ in the reply's | ||||
| access field, the client will be able to grant the | ||||
| user execute access to the file. | ||||
| </li> | ||||
| <li> | ||||
| If the server supports execute permission bits, or some other | ||||
| method for denoting executability (e.g., the suffix of the name | ||||
| of the file might indicate execute), it <bcp14>MUST</bcp14> check | ||||
| only execute permissions, not read permissions, when determining | ||||
| whether or not the reply will have ACCESS4_EXECUTE set in the access | ||||
| field. | ||||
| The server <bcp14>MUST NOT</bcp14> also examine read permission bits when | ||||
| determining whether or not the reply will have ACCESS4_EXECUTE | ||||
| set in the access field. Even if the server's | ||||
| operating environment would grant execute access to the | ||||
| user (e.g., the user is privileged), the server <bcp14>MUST | ||||
| NOT</bcp14> reply with ACCESS4_EXECUTE set in reply's access | ||||
| field unless there is at least one execute permission | ||||
| bit set in the mode, acl, or dacl attributes. In the | ||||
| case of acl and dacl, the "one execute permission bit" | ||||
| <bcp14>MUST</bcp14> be an ACE4_EXECUTE bit set in an ALLOW ACE. | ||||
| </li> | ||||
| <li> | ||||
| If the server does not support execute permission | ||||
| bits or some other method for denoting executability, it <bcp14>MUST NOT</bcp14> set ACCESS4_EXECUTE in the | ||||
| reply's supported and access fields. If the client | ||||
| set ACCESS4_EXECUTE in the ACCESS request's access | ||||
| field, and ACCESS4_EXECUTE is not set in the reply's | ||||
| supported field, then the client will have to send | ||||
| an ACCESS request with the ACCESS4_READ bit set in | ||||
| the request's access field. | ||||
| </li> | ||||
| <li> | ||||
| If the server supports read permission bits, it <bcp14>MUST</bcp14> | ||||
| only check for read permissions in the mode, acl, | ||||
| and dacl attributes when it receives an ACCESS request | ||||
| with ACCESS4_READ set in the access field. The server | ||||
| <bcp14>MUST NOT</bcp14> also examine execute permission bits when | ||||
| determining whether the reply will have ACCESS4_READ | ||||
| set in the access field or not. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Note that if the ACCESS reply has ACCESS4_READ | ||||
| or ACCESS_EXECUTE set, then the user also has | ||||
| permissions to OPEN (<xref target="OP_OPEN" format="default"/>) or | ||||
| READ (<xref target="OP_READ" format="default"/>) the file. In other words, if | ||||
| the client sends an ACCESS request with the ACCESS4_READ | ||||
| and ACCESS_EXECUTE set in the access field (or two | ||||
| separate requests, one with ACCESS4_READ set and the | ||||
| other with ACCESS4_EXECUTE set), and the reply has | ||||
| just ACCESS4_EXECUTE set in the access field (or just | ||||
| one reply has ACCESS4_EXECUTE set), then the user has | ||||
| authorization to OPEN or READ the file. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_ACCESS_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| In general, it is not sufficient for the client to attempt to deduce | ||||
| access permissions by inspecting the uid, gid, and mode fields in the | ||||
| file attributes or by attempting to interpret the contents of the ACL | ||||
| attribute. This is because the server may perform uid or gid mapping | ||||
| or enforce additional access-control restrictions. It is also | ||||
| possible that the server may not be in the same ID space as the | ||||
| client. In these cases (and perhaps others), the client cannot | ||||
| reliably perform an access check with only current file attributes. | ||||
| </t> | ||||
| <t> | ||||
| In the NFSv2 protocol, the only reliable way to determine | ||||
| whether an operation was allowed was to try it and see if it succeeded | ||||
| or failed. Using the ACCESS operation in the NFSv4.1 protocol, | ||||
| the client can ask the server to indicate whether or not one or more | ||||
| classes of operations are permitted. The ACCESS operation is provided | ||||
| to allow clients to check before doing a series of operations that | ||||
| will result in an access failure. The OPEN operation provides a point | ||||
| where the server can verify access to the file object and a method to | ||||
| return that information to the client. The ACCESS operation is still | ||||
| useful for directory operations or for use in the case that the UNIX interface | ||||
| access() is used on the client. | ||||
| </t> | ||||
| <t> | ||||
| The information returned by the server in response to an ACCESS call | ||||
| is not permanent. It was correct at the exact time that the server | ||||
| performed the checks, but not necessarily afterwards. The server can | ||||
| revoke access permission at any time. | ||||
| </t> | ||||
| <t> | ||||
| The client should use the effective credentials of the user to build | ||||
| the authentication information in the ACCESS request used to determine | ||||
| access rights. It is the effective user and group credentials that | ||||
| are used in subsequent READ and WRITE operations. | ||||
| </t> | ||||
| <t> | ||||
| Many implementations do not directly support the ACCESS4_DELETE | ||||
| permission. Operating systems like UNIX will ignore the ACCESS4_DELETE | ||||
| bit if set on an access request on a non-directory object. In these | ||||
| systems, delete permission on a file is determined by the access | ||||
| permissions on the directory in which the file resides, instead of | ||||
| being determined by the permissions of the file itself. Therefore, | ||||
| the mask returned enumerating which access rights can be determined | ||||
| will have the ACCESS4_DELETE value set to 0. This indicates to the | ||||
| client that the server was unable to check that particular access | ||||
| right. The ACCESS4_DELETE bit in the access mask returned will then be | ||||
| ignored by the client. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CLOSE" numbered="true" toc="default"> | ||||
| <name>Operation 4: CLOSE - Close File</name> | ||||
| <section toc="exclude" anchor="OP_CLOSE_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CLOSE4args { | ||||
| /* CURRENT_FH: object */ | ||||
| seqid4 seqid; | ||||
| stateid4 open_stateid; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CLOSE_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| union CLOSE4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| stateid4 open_stateid; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CLOSE_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The CLOSE operation releases share reservations for the regular or | ||||
| named attribute file as specified by the current filehandle. The | ||||
| share reservations and other state information released at the server | ||||
| as a result of this CLOSE are only those associated with the supplied | ||||
| stateid. State associated with other OPENs is not affected. | ||||
| </t> | ||||
| <t> | ||||
| If byte-range locks are held, the client <bcp14>SHOULD</bcp14> release all locks before | ||||
| sending a CLOSE. The server <bcp14>MAY</bcp14> free all outstanding locks on CLOSE, | ||||
| but some servers may not support the CLOSE of a file that still has | ||||
| byte-range locks held. The server <bcp14>MUST</bcp14> return failure if any locks would | ||||
| exist after the CLOSE. | ||||
| </t> | ||||
| <t> | ||||
| The argument seqid <bcp14>MAY</bcp14> have any value, and the server <bcp14>MUST</bcp14> ignore seqid. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| <t> | ||||
| The server <bcp14>MAY</bcp14> require that the combination of principal, security | ||||
| flavor, and, if applicable, GSS mechanism | ||||
| that sent the OPEN request also be the one to CLOSE | ||||
| the file. This might not be possible if credentials | ||||
| for the principal are no longer available. The server | ||||
| <bcp14>MAY</bcp14> allow the machine credential or SSV credential | ||||
| (see <xref target="OP_EXCHANGE_ID" format="default"/>) to send CLOSE. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CLOSE_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| Even though CLOSE returns a stateid, this stateid is not useful to the | ||||
| client and should be treated as deprecated. CLOSE "shuts down" the | ||||
| state associated with all OPENs for the file by a single open-owner. | ||||
| As noted above, CLOSE will either release all file-locking state or | ||||
| return an error. Therefore, the stateid returned by CLOSE is not | ||||
| useful for operations that follow. To help find any uses of | ||||
| this stateid by clients, the server <bcp14>SHOULD</bcp14> return the invalid | ||||
| special stateid (the "other" value is zero and the "seqid" field | ||||
| is NFS4_UINT32_MAX, see <xref target="special_stateid" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| A CLOSE operation may make delegations grantable | ||||
| where they were not previously. Servers may choose to respond | ||||
| immediately if there are pending delegation want requests or may | ||||
| respond to the situation at a later time. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="OP_COMMIT" numbered="true" toc="default"> | ||||
| <name>Operation 5: COMMIT - Commit Cached Data</name> | ||||
| <section toc="exclude" anchor="OP_COMMIT_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct COMMIT4args { | ||||
| /* CURRENT_FH: file */ | ||||
| offset4 offset; | ||||
| count4 count; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_COMMIT_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct COMMIT4resok { | ||||
| verifier4 writeverf; | ||||
| }; | ||||
| union COMMIT4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| COMMIT4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_COMMIT_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The COMMIT operation forces or flushes uncommitted, modified data to stable storage for the | ||||
| file specified by the current filehandle. The flushed data is that | ||||
| which was previously written with one or more WRITE operations that had the | ||||
| "committed" field of their results field set to UNSTABLE4. | ||||
| </t> | ||||
| <t> | ||||
| The offset specifies the position within the file where the flush is | ||||
| to begin. An offset value of zero means to flush data starting at | ||||
| the beginning of the file. The count specifies the number of bytes of | ||||
| data to flush. If the count is zero, a flush from the offset to the end | ||||
| of the file is done. | ||||
| </t> | ||||
| <t> | ||||
| The server returns a write verifier upon successful completion of the | ||||
| COMMIT. The write verifier is used by the client to determine if the | ||||
| server has restarted between the initial WRITE operations and the | ||||
| COMMIT. The client does this by comparing the write verifier returned | ||||
| from the initial WRITE operations and the verifier returned by the COMMIT | ||||
| operation. The server must vary the value of the write verifier at | ||||
| each server event or instantiation that may lead to a loss of | ||||
| uncommitted data. Most commonly this occurs when the server is | ||||
| restarted; however, other events at the server may result in | ||||
| uncommitted data loss as well. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_COMMIT_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The COMMIT operation is similar in operation and semantics to the | ||||
| <xref target="fsync" format="default">POSIX fsync()</xref> system interface that synchronizes a file's state with the | ||||
| disk (file data and metadata is flushed to disk or stable | ||||
| storage). COMMIT performs the same operation for a client, flushing | ||||
| any unsynchronized data and metadata on the server to the server's | ||||
| disk or stable storage for the specified file. Like fsync(), it may | ||||
| be that there is some modified data or no modified data to | ||||
| synchronize. The data may have been synchronized by the server's | ||||
| normal periodic buffer synchronization activity. COMMIT should return | ||||
| NFS4_OK, unless there has been an unexpected error. | ||||
| </t> | ||||
| <t> | ||||
| COMMIT differs from fsync() in that it is possible for the client to | ||||
| flush a range of the file (most likely triggered by a | ||||
| buffer-reclamation scheme on the client before the file has been | ||||
| completely written). | ||||
| </t> | ||||
| <t> | ||||
| The server implementation of COMMIT is reasonably simple. If the | ||||
| server receives a full file COMMIT request, that is, starting at offset | ||||
| zero and count zero, it should do the equivalent of applying fsync() to | ||||
| the entire file. | ||||
| Otherwise, it should arrange to have the modified data in the range | ||||
| specified by offset and count to be flushed to stable storage. In | ||||
| both cases, any metadata associated with the file must be flushed to | ||||
| stable storage before returning. It is not an error for there to be | ||||
| nothing to flush on the server. This means that the data and metadata | ||||
| that needed to be flushed have already been flushed or lost during the | ||||
| last server failure. | ||||
| </t> | ||||
| <t> | ||||
| The client implementation of COMMIT is a little more complex. There | ||||
| are two reasons for wanting to commit a client buffer to stable | ||||
| storage. The first is that the client wants to reuse a buffer. In | ||||
| this case, the offset and count of the buffer are sent to the server | ||||
| in the COMMIT request. The server then flushes any modified data based | ||||
| on the offset and count, and flushes any modified metadata associated with the | ||||
| file. It then returns the status of the flush and the write verifier. | ||||
| The second reason for the client to generate a COMMIT is for a full | ||||
| file flush, such as may be done at close. In this case, the client | ||||
| would gather all of the buffers for this file that contain uncommitted | ||||
| data, do the COMMIT operation with an offset of zero and count of zero, and | ||||
| then free all of those buffers. Any other dirty buffers would be sent | ||||
| to the server in the normal fashion. | ||||
| </t> | ||||
| <t> | ||||
| After a buffer is written (via the WRITE operation) | ||||
| by the client with the "committed" field in the result of WRITE | ||||
| set to UNSTABLE4, the buffer must be considered as modified by | ||||
| the client | ||||
| until the buffer has either been flushed via a COMMIT operation or | ||||
| written via a WRITE operation with the "committed" field in the | ||||
| result set to FILE_SYNC4 | ||||
| or DATA_SYNC4. This is done to prevent the buffer from being freed and | ||||
| reused before the data can be flushed to stable storage on the server. | ||||
| </t> | ||||
| <t> | ||||
| When a response is returned from either a WRITE or a COMMIT operation | ||||
| and it contains a write verifier that differs from that previously | ||||
| returned by the server, the client will need to retransmit all of the | ||||
| buffers containing uncommitted data to the server. How this is | ||||
| to be done is up to the implementor. If there is only one buffer of | ||||
| interest, then it should be sent in a WRITE request | ||||
| with the FILE_SYNC4 stable parameter. If there is more than one | ||||
| buffer, it might be worthwhile retransmitting all of the buffers in | ||||
| WRITE operations with the stable parameter set to UNSTABLE4 and then | ||||
| retransmitting the COMMIT operation to flush all of the data on the | ||||
| server to stable storage. However, if the server repeatably | ||||
| returns from COMMIT a verifier that differs from that returned | ||||
| by WRITE, the only way to ensure progress is to retransmit all | ||||
| of the buffers with WRITE requests with the FILE_SYNC4 stable parameter. | ||||
| </t> | ||||
| <t> | ||||
| The above description applies to page-cache-based systems as well as | ||||
| buffer-cache-based systems. In the former systems, the virtual memory | ||||
| system will need to be modified instead of the buffer cache. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CREATE" numbered="true" toc="default"> | ||||
| <name>Operation 6: CREATE - Create a Non-Regular File Object</name> | ||||
| <section toc="exclude" anchor="OP_CREATE_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| union createtype4 switch (nfs_ftype4 type) { | ||||
| case NF4LNK: | ||||
| linktext4 linkdata; | ||||
| case NF4BLK: | ||||
| case NF4CHR: | ||||
| specdata4 devdata; | ||||
| case NF4SOCK: | ||||
| case NF4FIFO: | ||||
| case NF4DIR: | ||||
| void; | ||||
| default: | ||||
| void; /* server should return NFS4ERR_BADTYPE */ | ||||
| }; | ||||
| struct CREATE4args { | ||||
| /* CURRENT_FH: directory for creation */ | ||||
| createtype4 objtype; | ||||
| component4 objname; | ||||
| fattr4 createattrs; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CREATE_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CREATE4resok { | ||||
| change_info4 cinfo; | ||||
| bitmap4 attrset; /* attributes set */ | ||||
| }; | ||||
| union CREATE4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| /* new CURRENTFH: created object */ | ||||
| CREATE4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CREATE_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The CREATE operation creates a file object other than an | ||||
| ordinary file in a directory with a given name. | ||||
| The OPEN operation <bcp14>MUST</bcp14> be used to create a | ||||
| regular file or a named attribute. | ||||
| </t> | ||||
| <t> | ||||
| The current filehandle must be a directory: an object of type NF4DIR. If the current | ||||
| filehandle is an attribute directory (type NF4ATTRDIR), the | ||||
| error NFS4ERR_WRONG_TYPE is returned. If the current filehandle | ||||
| designates any other type of object, the error NFS4ERR_NOTDIR | ||||
| results. | ||||
| </t> | ||||
| <t> | ||||
| The objname specifies the name for the new object. | ||||
| The objtype determines the type of object to be | ||||
| created: directory, symlink, etc. If the object | ||||
| type specified is that of an ordinary file, a | ||||
| named attribute, or a named attribute directory, | ||||
| the error NFS4ERR_BADTYPE results. | ||||
| </t> | ||||
| <t> | ||||
| If an object of the same name already exists in the directory, the | ||||
| server will return the error NFS4ERR_EXIST. | ||||
| </t> | ||||
| <t> | ||||
| For the directory where the new file object was created, the server | ||||
| returns change_info4 information in cinfo. With the atomic field of | ||||
| the change_info4 data type, the server will indicate if the before and | ||||
| after change attributes were obtained atomically with respect to the | ||||
| file object creation. | ||||
| </t> | ||||
| <t> | ||||
| If the objname has a length of zero, or if objname does not obey | ||||
| the UTF-8 definition, the error NFS4ERR_INVAL will be returned. | ||||
| </t> | ||||
| <t> | ||||
| The current filehandle is replaced by that of the new object. | ||||
| </t> | ||||
| <t> | ||||
| The createattrs specifies the initial set of attributes for the | ||||
| object. The set of attributes may include any writable attribute | ||||
| valid for the object type. When the operation is successful, the | ||||
| server will return to the client an attribute mask signifying which | ||||
| attributes were successfully set for the object. | ||||
| </t> | ||||
| <t> | ||||
| If createattrs includes neither the owner attribute nor an ACL with an | ||||
| ACE for the owner, and if the server's file system both supports and | ||||
| requires an owner attribute (or an owner ACE), then the server <bcp14>MUST</bcp14> | ||||
| derive the owner (or the owner ACE). This would typically be from the | ||||
| principal indicated in the RPC credentials of the call, but the | ||||
| server's operating environment or file system semantics may dictate | ||||
| other methods of derivation. Similarly, if createattrs includes | ||||
| neither the group attribute nor a group ACE, and if the server's | ||||
| file system both supports and requires the notion of a group attribute | ||||
| (or group ACE), the server <bcp14>MUST</bcp14> derive the group attribute (or the | ||||
| corresponding owner ACE) for the file. This could be from the RPC | ||||
| call's credentials, such as the group principal if the credentials | ||||
| include it (such as with AUTH_SYS), from the group identifier | ||||
| associated with the principal in the credentials (e.g., POSIX | ||||
| systems have a <xref target="passwd" format="default">user database</xref> that has a group identifier for every | ||||
| user identifier), inherited from the directory in which the object is created, | ||||
| or whatever else the server's operating environment or file system | ||||
| semantics dictate. This applies to the OPEN operation too. | ||||
| </t> | ||||
| <t> | ||||
| Conversely, it is possible that the client will specify in createattrs an | ||||
| owner attribute, group attribute, or ACL that the principal indicated | ||||
| the RPC call's credentials does not have permissions to create files | ||||
| for. The error to be returned in this instance is NFS4ERR_PERM. This | ||||
| applies to the OPEN operation too. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle designates a directory for which another | ||||
| client holds a directory delegation, then, unless the delegation | ||||
| is such that the situation can be resolved by sending a notification, | ||||
| the delegation <bcp14>MUST</bcp14> be recalled, and the CREATE operation <bcp14>MUST NOT</bcp14> proceed | ||||
| until the delegation is returned or revoked. Except where this | ||||
| happens very quickly, one or more NFS4ERR_DELAY errors will be | ||||
| returned to requests made while delegation remains outstanding. | ||||
| </t> | ||||
| <t> | ||||
| When the current filehandle designates a directory for which | ||||
| one or more directory delegations exist, then, when those delegations | ||||
| request such notifications, NOTIFY4_ADD_ENTRY will be generated | ||||
| as a result of this operation. | ||||
| </t> | ||||
| <t> | ||||
| If the capability FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 is set | ||||
| (<xref target="utf8_caps" format="default"/>), | ||||
| and a symbolic link is being created, then the content | ||||
| of the symbolic link <bcp14>MUST</bcp14> be in UTF-8 encoding. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CREATE_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| If the client desires to set attribute values after the create, a | ||||
| SETATTR operation can be added to the COMPOUND request so that the | ||||
| appropriate attributes will be set. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_DELEGPURGE" numbered="true" toc="default"> | ||||
| <name>Operation 7: DELEGPURGE - Purge Delegations Awaiting Recovery</name> | ||||
| <section toc="exclude" anchor="OP_DELEGPURGE_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct DELEGPURGE4args { | ||||
| clientid4 clientid; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_DELEGPURGE_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct DELEGPURGE4res { | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_DELEGPURGE_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation purges all of the delegations awaiting recovery for a given client. | ||||
| This is useful for clients that do not commit delegation information | ||||
| to stable storage to indicate that conflicting requests need not be | ||||
| delayed by the server awaiting recovery of delegation information. | ||||
| </t> | ||||
| <t> | ||||
| The client is NOT specified by the clientid field of | ||||
| the request. The client <bcp14>SHOULD</bcp14> set the client field | ||||
| to zero, and the server <bcp14>MUST</bcp14> ignore the clientid | ||||
| field. Instead, the server <bcp14>MUST</bcp14> derive the client ID | ||||
| from the value of the session ID in the arguments of | ||||
| the SEQUENCE operation that precedes DELEGPURGE in | ||||
| the COMPOUND request. | ||||
| </t> | ||||
| <t> | ||||
| The DELEGPURGE operation should be used by clients that record delegation | ||||
| information on stable storage on the client. In this case, | ||||
| after the client recovers all delegations it knows of, | ||||
| it should immediately send a DELEGPURGE operation. | ||||
| Doing so will notify the server that | ||||
| no additional delegations for the client will be recovered allowing it | ||||
| to free resources, and avoid delaying other clients which make requests | ||||
| that conflict with the unrecovered delegations. The set of | ||||
| delegations known to the server and the client might be different. The | ||||
| reason for this is that after sending a request that | ||||
| resulted in a delegation, the client might experience a failure | ||||
| before it both received the delegation and | ||||
| committed the delegation to the client's stable storage. | ||||
| </t> | ||||
| <t> | ||||
| The server <bcp14>MAY</bcp14> support DELEGPURGE, but if it does not, it <bcp14>MUST NOT</bcp14> | ||||
| support CLAIM_DELEGATE_PREV and <bcp14>MUST NOT</bcp14> support CLAIM_DELEG_PREV_FH. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_DELEGRETURN" numbered="true" toc="default"> | ||||
| <name>Operation 8: DELEGRETURN - Return Delegation</name> | ||||
| <section toc="exclude" anchor="OP_DELEGRETURN_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct DELEGRETURN4args { | ||||
| /* CURRENT_FH: delegated object */ | ||||
| stateid4 deleg_stateid; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_DELEGRETURN_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct DELEGRETURN4res { | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_DELEGRETURN_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The DELEGRETURN operation returns the delegation represented by | ||||
| the current filehandle and stateid. | ||||
| </t> | ||||
| <t> | ||||
| Delegations may be returned voluntarily (i.e., before | ||||
| the server has recalled them) or when recalled. In either case, the client must | ||||
| properly propagate state changed under the context of the delegation to | ||||
| the server before returning the delegation. | ||||
| </t> | ||||
| <t> | ||||
| The server <bcp14>MAY</bcp14> require that the principal, security | ||||
| flavor, and if applicable, the GSS mechanism, combination | ||||
| that acquired the delegation also be the one to send | ||||
| DELEGRETURN on the file. This might not be possible | ||||
| if credentials for the principal are no longer | ||||
| available. The server <bcp14>MAY</bcp14> allow the machine credential | ||||
| or SSV credential (see <xref target="OP_EXCHANGE_ID" format="default"/>) to send DELEGRETURN. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_GETATTR" numbered="true" toc="default"> | ||||
| <name>Operation 9: GETATTR - Get Attributes</name> | ||||
| <section toc="exclude" anchor="OP_GETATTR_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct GETATTR4args { | ||||
| /* CURRENT_FH: object */ | ||||
| bitmap4 attr_request; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GETATTR_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct GETATTR4resok { | ||||
| fattr4 obj_attributes; | ||||
| }; | ||||
| union GETATTR4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| GETATTR4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GETATTR_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The GETATTR operation will obtain attributes for the file system | ||||
| object specified by the current filehandle. The client sets a bit in | ||||
| the bitmap argument for each attribute value that it would like the | ||||
| server to return. The server returns an attribute bitmap that | ||||
| indicates the attribute values that it was able to return, | ||||
| which will include all attributes requested by the client that | ||||
| are attributes supported by the server for the target | ||||
| file system. This bitmap is followed by the attribute values ordered | ||||
| lowest attribute number first. | ||||
| </t> | ||||
| <t> | ||||
| The server <bcp14>MUST</bcp14> return a value for each attribute that the client | ||||
| requests if the attribute is supported by the server for the target | ||||
| file system. If the server does not support a particular attribute | ||||
| on the target file system, then it <bcp14>MUST NOT</bcp14> return the attribute value | ||||
| and <bcp14>MUST NOT</bcp14> set the attribute bit in the result bitmap. The server | ||||
| <bcp14>MUST</bcp14> return an error if it supports an attribute on the target | ||||
| but cannot obtain its value. In that case, no attribute values will | ||||
| be returned. | ||||
| </t> | ||||
| <t> | ||||
| File systems that are absent should be treated as having support for | ||||
| a very small set of attributes as described in | ||||
| <xref target="absent_getattr" format="default"/>, | ||||
| even if previously, when the file system was present, more attributes | ||||
| were supported. | ||||
| </t> | ||||
| <t> | ||||
| All servers <bcp14>MUST</bcp14> support the <bcp14>REQUIRED</bcp14> attributes as specified in | ||||
| <xref target="mandatory_attributes" format="default"/>, for all file systems, | ||||
| with the exception of absent file systems. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GETATTR_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| Suppose there is an OPEN_DELEGATE_WRITE delegation held by another client for | ||||
| the file | ||||
| in question and size and/or change are among the set of attributes being interrogated. The server has two choices. | ||||
| First, the server can obtain the actual | ||||
| current value of these attributes from the client holding the delegation | ||||
| by using the CB_GETATTR callback. Second, the server, particularly when the | ||||
| delegated client is unresponsive, can recall the | ||||
| delegation in question. The GETATTR <bcp14>MUST NOT</bcp14> proceed | ||||
| until one of the following occurs: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The requested attribute values are returned in the response to | ||||
| CB_GETATTR. | ||||
| </li> | ||||
| <li> | ||||
| The OPEN_DELEGATE_WRITE delegation is returned. | ||||
| </li> | ||||
| <li> | ||||
| The OPEN_DELEGATE_WRITE delegation is revoked. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Unless one of the above happens very quickly, | ||||
| one or more NFS4ERR_DELAY errors will be returned | ||||
| while a delegation is outstanding. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_GETFH" numbered="true" toc="default"> | ||||
| <name>Operation 10: GETFH - Get Current Filehandle</name> | ||||
| <section toc="exclude" anchor="OP_GETFH_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* CURRENT_FH: */ | ||||
| void; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GETFH_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct GETFH4resok { | ||||
| nfs_fh4 object; | ||||
| }; | ||||
| union GETFH4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| GETFH4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GETFH_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation returns the current filehandle value. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| <t> | ||||
| As described in <xref target="COMPOUND_Sizing_Issues" format="default"/>, GETFH | ||||
| is <bcp14>REQUIRED</bcp14> or <bcp14>RECOMMENDED</bcp14> to | ||||
| immediately follow certain operations, and servers | ||||
| are free to reject such operations if | ||||
| the client fails to insert | ||||
| GETFH in the request as <bcp14>REQUIRED</bcp14> or <bcp14>RECOMMENDED</bcp14>. | ||||
| <xref target="open_getfh_issue" format="default"/> provides additional | ||||
| justification for why GETFH <bcp14>MUST</bcp14> follow OPEN. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GETFH_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| Operations that change the current filehandle like LOOKUP or CREATE do | ||||
| not automatically return the new filehandle as a result. For | ||||
| instance, if a client needs to look up a directory entry and obtain its | ||||
| filehandle, then the following request is needed. | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| PUTFH (directory filehandle) | ||||
| </li> | ||||
| <li> | ||||
| LOOKUP (entry name) | ||||
| </li> | ||||
| <li> | ||||
| GETFH | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_LINK" numbered="true" toc="default"> | ||||
| <name>Operation 11: LINK - Create Link to a File</name> | ||||
| <section toc="exclude" anchor="OP_LINK_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct LINK4args { | ||||
| /* SAVED_FH: source object */ | ||||
| /* CURRENT_FH: target directory */ | ||||
| component4 newname; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LINK_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct LINK4resok { | ||||
| change_info4 cinfo; | ||||
| }; | ||||
| union LINK4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| LINK4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LINK_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The LINK operation creates an additional newname for the file | ||||
| represented by the saved filehandle, as set by the SAVEFH operation, | ||||
| in the directory represented by the current filehandle. The existing | ||||
| file and the target directory must reside within the same file system | ||||
| on the server. On success, the current filehandle will continue to be | ||||
| the target directory. If an object exists in the target directory | ||||
| with the same name as newname, the server must return NFS4ERR_EXIST. | ||||
| </t> | ||||
| <t> | ||||
| For the target directory, the server returns change_info4 information | ||||
| in cinfo. With the atomic field of the change_info4 data type, the | ||||
| server will indicate if the before and after change attributes were | ||||
| obtained atomically with respect to the link creation. | ||||
| </t> | ||||
| <t> | ||||
| If the newname has a length of zero, or if newname does not obey | ||||
| the UTF-8 definition, the error NFS4ERR_INVAL will be returned. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LINK_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The server <bcp14>MAY</bcp14> impose restrictions on the LINK operation such that | ||||
| LINK may not be done when the file is open or when that open is done | ||||
| by particular protocols, or with particular options or access modes. | ||||
| When LINK is rejected because of such restrictions, the error | ||||
| NFS4ERR_FILE_OPEN is returned. | ||||
| </t> | ||||
| <t> | ||||
| If a server does implement such restrictions and those restrictions | ||||
| include cases of NFSv4 opens preventing successful execution of | ||||
| a link, the server needs to recall any delegations that could | ||||
| hide the existence of opens relevant to that decision. The reason | ||||
| is that when a client holds a delegation, the server | ||||
| might not have an accurate account of the opens for that client, since | ||||
| the client may execute OPENs and CLOSEs locally. The LINK operation | ||||
| must be delayed only until a definitive result can be obtained. | ||||
| For example, suppose there are multiple delegations and one of them establishes | ||||
| an open whose presence would prevent the link. Given the server's | ||||
| semantics, NFS4ERR_FILE_OPEN may be returned to the caller as soon | ||||
| as that delegation is returned without waiting for other delegations | ||||
| to be returned. Similarly, if such opens are not associated with | ||||
| delegations, NFS4ERR_FILE_OPEN can be returned immediately with no | ||||
| delegation recall being done. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle designates a directory for which another | ||||
| client holds a directory delegation, then, unless the delegation | ||||
| is such that the situation can be resolved by sending a notification, | ||||
| the delegation <bcp14>MUST</bcp14> be recalled, and the operation cannot be | ||||
| performed successfully until the delegation is returned or revoked. Except where this | ||||
| happens very quickly, one or more NFS4ERR_DELAY errors will be | ||||
| returned to requests made while delegation remains outstanding. | ||||
| </t> | ||||
| <t> | ||||
| When the current filehandle designates a directory for which | ||||
| one or more directory delegations exist, then, when those delegations | ||||
| request such notifications, instead of a recall, | ||||
| NOTIFY4_ADD_ENTRY will be generated | ||||
| as a result of the LINK operation. | ||||
| </t> | ||||
| <t> | ||||
| If the current file system supports the numlinks attribute, and | ||||
| other clients have delegations to the file being linked, then those | ||||
| delegations <bcp14>MUST</bcp14> be recalled and the LINK operation <bcp14>MUST NOT</bcp14> proceed until | ||||
| all delegations are returned or revoked. Except where this | ||||
| happens very quickly, one or more NFS4ERR_DELAY errors will be | ||||
| returned to requests made while delegation remains outstanding. | ||||
| </t> | ||||
| <t> | ||||
| Changes to any property of the "hard" linked files are reflected in | ||||
| all of the linked files. When a link is made to a file, the | ||||
| attributes for the file should have a value for numlinks that is one | ||||
| greater than the value before the LINK operation. | ||||
| </t> | ||||
| <t> | ||||
| The statement "file and the target directory must reside within the | ||||
| same file system on the server" means that the fsid fields in the | ||||
| attributes for the objects are the same. If they reside on | ||||
| different file systems, the error NFS4ERR_XDEV is returned. | ||||
| This error may be returned by some servers when there is an | ||||
| internal partitioning of a file system that the LINK operation | ||||
| would violate. | ||||
| </t> | ||||
| <t> | ||||
| On some | ||||
| servers, "." and ".." are illegal values for newname | ||||
| and the error NFS4ERR_BADNAME will be returned if they are specified. | ||||
| </t> | ||||
| <t> | ||||
| When the current filehandle designates a named attribute directory | ||||
| and the object to be linked (the saved filehandle) is not a named | ||||
| attribute for the same object, the error NFS4ERR_XDEV <bcp14>MUST</bcp14> be | ||||
| returned. When the saved filehandle designates a named attribute | ||||
| and the current filehandle is not the appropriate named attribute | ||||
| directory, the error NFS4ERR_XDEV <bcp14>MUST</bcp14> also be returned. | ||||
| </t> | ||||
| <t> | ||||
| When the current filehandle designates a named attribute directory | ||||
| and the object to be linked (the saved filehandle) is a named | ||||
| attribute within that directory, the server may return | ||||
| the error NFS4ERR_NOTSUPP. | ||||
| </t> | ||||
| <t> | ||||
| In the case that newname is already linked to the file represented by | ||||
| the saved filehandle, the server will return NFS4ERR_EXIST. | ||||
| </t> | ||||
| <t> | ||||
| Note that symbolic links are created with the CREATE operation. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_LOCK" numbered="true" toc="default"> | ||||
| <name>Operation 12: LOCK - Create Lock</name> | ||||
| <section toc="exclude" anchor="OP_LOCK_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* | ||||
| * For LOCK, transition from open_stateid and lock_owner | ||||
| * to a lock stateid. | ||||
| */ | ||||
| struct open_to_lock_owner4 { | ||||
| seqid4 open_seqid; | ||||
| stateid4 open_stateid; | ||||
| seqid4 lock_seqid; | ||||
| lock_owner4 lock_owner; | ||||
| }; | ||||
| /* | ||||
| * For LOCK, existing lock stateid continues to request new | ||||
| * file lock for the same lock_owner and open_stateid. | ||||
| */ | ||||
| struct exist_lock_owner4 { | ||||
| stateid4 lock_stateid; | ||||
| seqid4 lock_seqid; | ||||
| }; | ||||
| union locker4 switch (bool new_lock_owner) { | ||||
| case TRUE: | ||||
| open_to_lock_owner4 open_owner; | ||||
| case FALSE: | ||||
| exist_lock_owner4 lock_owner; | ||||
| }; | ||||
| /* | ||||
| * LOCK/LOCKT/LOCKU: Record lock management | ||||
| */ | ||||
| struct LOCK4args { | ||||
| /* CURRENT_FH: file */ | ||||
| nfs_lock_type4 locktype; | ||||
| bool reclaim; | ||||
| offset4 offset; | ||||
| length4 length; | ||||
| locker4 locker; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOCK_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct LOCK4denied { | ||||
| offset4 offset; | ||||
| length4 length; | ||||
| nfs_lock_type4 locktype; | ||||
| lock_owner4 owner; | ||||
| }; | ||||
| struct LOCK4resok { | ||||
| stateid4 lock_stateid; | ||||
| }; | ||||
| union LOCK4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| LOCK4resok resok4; | ||||
| case NFS4ERR_DENIED: | ||||
| LOCK4denied denied; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOCK_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The LOCK operation requests a byte-range lock for the byte-range specified | ||||
| by the offset and length parameters, and lock type specified in | ||||
| the locktype parameter. If this is a reclaim request, the | ||||
| reclaim parameter will be TRUE. | ||||
| </t> | ||||
| <t> | ||||
| Bytes in a file may be locked even if those bytes are not currently | ||||
| allocated to the file. To lock the file from a specific offset | ||||
| through the end-of-file (no matter how long the file actually is) use | ||||
| a length field equal to NFS4_UINT64_MAX. | ||||
| The server <bcp14>MUST</bcp14> return NFS4ERR_INVAL under the following | ||||
| combinations of length and offset: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Length is equal to zero. | ||||
| </li> | ||||
| <li> | ||||
| Length is not equal to NFS4_UINT64_MAX, and the sum of length | ||||
| and offset exceeds NFS4_UINT64_MAX. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| 32-bit servers are servers that support locking for | ||||
| byte offsets that fit within 32 bits (i.e., less than | ||||
| or equal to NFS4_UINT32_MAX). If the client specifies a | ||||
| range that overlaps one or more bytes beyond offset | ||||
| NFS4_UINT32_MAX but does not end at offset | ||||
| NFS4_UINT64_MAX, then such a 32-bit server <bcp14>MUST</bcp14> return the | ||||
| error NFS4ERR_BAD_RANGE. | ||||
| </t> | ||||
| <t> | ||||
| If the server returns NFS4ERR_DENIED, the | ||||
| owner, offset, and length | ||||
| of a conflicting lock are returned. | ||||
| </t> | ||||
| <t> | ||||
| The locker argument specifies the lock-owner that is associated with | ||||
| the LOCK operation. The locker4 structure is a switched union that | ||||
| indicates whether the client has already created byte-range locking | ||||
| state associated with the current open file and lock-owner. In the | ||||
| case in which it has, the argument is just a stateid representing | ||||
| the set of | ||||
| locks associated with that open file and lock-owner, together with | ||||
| a lock_seqid value that <bcp14>MAY</bcp14> be any value and <bcp14>MUST</bcp14> be ignored | ||||
| by the server. | ||||
| In the case where no byte-range locking state has been established, or the client | ||||
| does not have the stateid available, the argument contains the | ||||
| stateid of the open file with which this lock is to be associated, | ||||
| together with the lock-owner with which the lock is to be associated. | ||||
| The open_to_lock_owner case covers the very first lock done by a | ||||
| lock-owner for a given open file and offers a method to use the | ||||
| established state of the open_stateid to transition to the use of | ||||
| a lock stateid. | ||||
| </t> | ||||
| <t> | ||||
| The following fields of the locker parameter <bcp14>MAY</bcp14> be | ||||
| set to any value by the client and <bcp14>MUST</bcp14> be ignored | ||||
| by the server: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The clientid field of the lock_owner | ||||
| field of the open_owner field | ||||
| (locker.open_owner.lock_owner.clientid). The | ||||
| reason the server <bcp14>MUST</bcp14> ignore the clientid field | ||||
| is that the server <bcp14>MUST</bcp14> derive the client ID from | ||||
| the session ID from the SEQUENCE operation of the | ||||
| COMPOUND request. | ||||
| </li> | ||||
| <li> | ||||
| The open_seqid and lock_seqid fields of the | ||||
| open_owner field (locker.open_owner.open_seqid and | ||||
| locker.open_owner.lock_seqid). | ||||
| </li> | ||||
| <li> | ||||
| The lock_seqid field of the lock_owner field | ||||
| (locker.lock_owner.lock_seqid). | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Note that the client ID appearing in a LOCK4denied | ||||
| structure is the actual client associated with the | ||||
| conflicting lock, whether this is the client ID | ||||
| associated with the current session or a different | ||||
| one. Thus, if the server returns NFS4ERR_DENIED, | ||||
| it <bcp14>MUST</bcp14> set the clientid field of the owner field of the | ||||
| denied field. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle is not an ordinary file, an error will be | ||||
| returned to the client. In the case that the current filehandle | ||||
| represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. | ||||
| If the current filehandle designates a symbolic link, | ||||
| NFS4ERR_SYMLINK is returned. In all other cases, | ||||
| NFS4ERR_WRONG_TYPE is returned. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOCK_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| If the server is unable to determine the exact offset and length of | ||||
| the conflicting byte-range lock, the same offset and length that were provided in | ||||
| the arguments should be returned in the denied results. | ||||
| </t> | ||||
| <t> | ||||
| LOCK operations are subject to permission checks and to checks against | ||||
| the access type of the associated file. However, the specific right | ||||
| and modes required for various types of locks reflect the semantics of | ||||
| the server-exported file system, and are not specified by the protocol. | ||||
| For example, Windows 2000 allows a write lock of a file open for read access, | ||||
| while a POSIX-compliant system does not. | ||||
| </t> | ||||
| <t> | ||||
| When the client sends a LOCK operation that corresponds to a range that | ||||
| the lock-owner has locked already (with the same or different lock | ||||
| type), or to a sub-range of such a range, or to a byte-range that | ||||
| includes multiple locks already granted to that lock-owner, in whole or | ||||
| in part, and the server does not support such locking operations | ||||
| (i.e., does not support POSIX locking semantics), the server will | ||||
| return the error NFS4ERR_LOCK_RANGE. In that case, the client may | ||||
| return an error, or it may emulate the required operations, using only | ||||
| LOCK for ranges that do not include any bytes already locked by that | ||||
| lock-owner and LOCKU of locks held by that lock-owner (specifying an | ||||
| exactly matching range and type). Similarly, when the client sends a | ||||
| LOCK operation that amounts to upgrading (changing from a READ_LT lock to a | ||||
| WRITE_LT lock) or downgrading (changing from WRITE_LT lock to a READ_LT lock) | ||||
| an existing byte-range lock, and the server does not support such a lock, | ||||
| the server will return NFS4ERR_LOCK_NOTSUPP. Such operations may not | ||||
| perfectly reflect the required semantics in the face of conflicting | ||||
| LOCK operations from other clients. | ||||
| </t> | ||||
| <t> | ||||
| When a client holds an OPEN_DELEGATE_WRITE delegation, the client holding that | ||||
| delegation is assured that there are no opens by other clients. | ||||
| Thus, there can be no conflicting LOCK operations from such clients. | ||||
| Therefore, the client may be handling locking requests locally, | ||||
| without | ||||
| doing LOCK operations on the server. If it does that, it must be | ||||
| prepared to update the lock status on the server, by sending | ||||
| appropriate LOCK and LOCKU operations before returning | ||||
| the delegation. | ||||
| </t> | ||||
| <t> | ||||
| When one or more clients hold OPEN_DELEGATE_READ delegations, any LOCK operation | ||||
| where the server is implementing mandatory locking semantics <bcp14>MUST</bcp14> | ||||
| result in the recall of all such delegations. The LOCK operation may | ||||
| not be granted until all such delegations are returned or revoked. | ||||
| Except where this | ||||
| happens very quickly, one or more NFS4ERR_DELAY errors will be | ||||
| returned to requests made while the delegation remains outstanding. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_LOCKT" numbered="true" toc="default"> | ||||
| <name>Operation 13: LOCKT - Test for Lock</name> | ||||
| <section toc="exclude" anchor="OP_LOCKT_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct LOCKT4args { | ||||
| /* CURRENT_FH: file */ | ||||
| nfs_lock_type4 locktype; | ||||
| offset4 offset; | ||||
| length4 length; | ||||
| lock_owner4 owner; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOCKT_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| union LOCKT4res switch (nfsstat4 status) { | ||||
| case NFS4ERR_DENIED: | ||||
| LOCK4denied denied; | ||||
| case NFS4_OK: | ||||
| void; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOCKT_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The LOCKT operation tests the lock as specified in the arguments. If | ||||
| a conflicting lock exists, the owner, offset, length, and type of the | ||||
| conflicting lock are returned. | ||||
| The owner field in the results includes the client ID of the owner of | ||||
| the conflicting lock, whether this is the client ID associated with the | ||||
| current session or a different client ID. | ||||
| If no lock is held, nothing other than | ||||
| NFS4_OK is returned. Lock types READ_LT and READW_LT are processed in | ||||
| the same way in that a conflicting lock test is done without regard to | ||||
| blocking or non-blocking. The same is true for WRITE_LT and WRITEW_LT. | ||||
| </t> | ||||
| <t> | ||||
| The ranges are specified as for LOCK. The NFS4ERR_INVAL and | ||||
| NFS4ERR_BAD_RANGE errors are returned under the same circumstances | ||||
| as for LOCK. | ||||
| </t> | ||||
| <t> | ||||
| The clientid field of the owner <bcp14>MAY</bcp14> be set to | ||||
| any value by the client and <bcp14>MUST</bcp14> be ignored by | ||||
| the server. The reason the server <bcp14>MUST</bcp14> ignore the | ||||
| clientid field is that the server <bcp14>MUST</bcp14> derive the | ||||
| client ID from the session ID from the SEQUENCE | ||||
| operation of the COMPOUND request. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle is not an ordinary file, an error will be | ||||
| returned to the client. In the case that the current filehandle | ||||
| represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. | ||||
| If the current filehandle designates a symbolic link, | ||||
| NFS4ERR_SYMLINK is returned. In all other cases, | ||||
| NFS4ERR_WRONG_TYPE is returned. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOCKT_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| If the server is unable to determine the exact offset | ||||
| and length of the conflicting lock, the same offset | ||||
| and length that were provided in the arguments should | ||||
| be returned in the denied results. | ||||
| </t> | ||||
| <t> | ||||
| LOCKT uses a lock_owner4 rather a stateid4, as is used in | ||||
| LOCK to identify the owner. This is because the client does not | ||||
| have to open the file to test for the existence of a lock, so | ||||
| a stateid might not be available. | ||||
| </t> | ||||
| <t> | ||||
| As noted in <xref target="OP_LOCK_IMPLEMENTATION" format="default"/>, some | ||||
| servers may return NFS4ERR_LOCK_RANGE to certain (otherwise | ||||
| non-conflicting) LOCK operations that overlap ranges already | ||||
| granted to the current lock-owner. | ||||
| </t> | ||||
| <t> | ||||
| The LOCKT operation's test for conflicting locks <bcp14>SHOULD</bcp14> exclude | ||||
| locks for the current lock-owner, and thus should return NFS4_OK in | ||||
| such cases. Note that this means that a server might return | ||||
| NFS4_OK to a LOCKT request even though a LOCK operation for the | ||||
| same range and lock-owner would fail with NFS4ERR_LOCK_RANGE. | ||||
| </t> | ||||
| <t> | ||||
| When a client holds an OPEN_DELEGATE_WRITE delegation, it may choose | ||||
| (see <xref target="OP_LOCK_IMPLEMENTATION" format="default"/>) to handle LOCK | ||||
| requests locally. In such a case, LOCKT requests will similarly | ||||
| be handled locally. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_LOCKU" numbered="true" toc="default"> | ||||
| <name>Operation 14: LOCKU - Unlock File</name> | ||||
| <section toc="exclude" anchor="OP_LOCKU_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct LOCKU4args { | ||||
| /* CURRENT_FH: file */ | ||||
| nfs_lock_type4 locktype; | ||||
| seqid4 seqid; | ||||
| stateid4 lock_stateid; | ||||
| offset4 offset; | ||||
| length4 length; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOCKU_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| union LOCKU4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| stateid4 lock_stateid; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOCKU_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The LOCKU operation unlocks the byte-range lock specified by the | ||||
| parameters. The client may set the locktype field to any value that is | ||||
| legal for the nfs_lock_type4 enumerated type, and the server <bcp14>MUST</bcp14> | ||||
| accept any legal value for locktype. Any legal value for locktype has | ||||
| no effect on the success or failure of the LOCKU operation. | ||||
| </t> | ||||
| <t> | ||||
| The ranges are specified as for LOCK. The NFS4ERR_INVAL and | ||||
| NFS4ERR_BAD_RANGE errors are returned under the same circumstances as | ||||
| for LOCK. | ||||
| </t> | ||||
| <t> | ||||
| The seqid parameter <bcp14>MAY</bcp14> be any value and the server <bcp14>MUST</bcp14> ignore it. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle is not an ordinary file, an error will be | ||||
| returned to the client. In the case that the current filehandle | ||||
| represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. | ||||
| If the current filehandle designates a symbolic link, | ||||
| NFS4ERR_SYMLINK is returned. In all other cases, | ||||
| NFS4ERR_WRONG_TYPE is returned. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| <t> | ||||
| The server <bcp14>MAY</bcp14> require that the principal, security | ||||
| flavor, and if applicable, the GSS mechanism, combination | ||||
| that sent a LOCK operation also be the one to send | ||||
| LOCKU on the file. This might not be possible | ||||
| if credentials for the principal are no longer | ||||
| available. The server <bcp14>MAY</bcp14> allow the machine credential | ||||
| or SSV credential (see <xref target="OP_EXCHANGE_ID" format="default"/>) to send LOCKU. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOCKU_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| If the area to be unlocked does not correspond exactly to a lock | ||||
| actually held by the lock-owner, the server may return the error | ||||
| NFS4ERR_LOCK_RANGE. This includes the case in which the area is not | ||||
| locked, where the area is a sub-range of the area locked, where it | ||||
| overlaps the area locked without matching exactly, or the area | ||||
| specified includes multiple locks held by the lock-owner. In all of | ||||
| these cases, allowed by <xref target="fcntl" format="default">POSIX locking</xref> semantics, a client receiving | ||||
| this error should, if it desires support for such operations, simulate | ||||
| the operation using LOCKU on ranges corresponding to locks it actually | ||||
| holds, possibly followed by LOCK operations for the sub-ranges not being | ||||
| unlocked. | ||||
| </t> | ||||
| <t> | ||||
| When a client holds an OPEN_DELEGATE_WRITE delegation, it may choose | ||||
| (see <xref target="OP_LOCK_IMPLEMENTATION" format="default"/>) to handle LOCK | ||||
| requests locally. In such a case, LOCKU operations will similarly | ||||
| be handled locally. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_LOOKUP" numbered="true" toc="default"> | ||||
| <name>Operation 15: LOOKUP - Lookup Filename</name> | ||||
| <section toc="exclude" anchor="OP_LOOKUP_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct LOOKUP4args { | ||||
| /* CURRENT_FH: directory */ | ||||
| component4 objname; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOOKUP_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct LOOKUP4res { | ||||
| /* New CURRENT_FH: object */ | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOOKUP_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The LOOKUP operation looks up or finds a file system object using the | ||||
| directory specified by the current filehandle. LOOKUP evaluates the | ||||
| component and if the object exists, the current filehandle is replaced | ||||
| with the component's filehandle. | ||||
| </t> | ||||
| <t> | ||||
| If the component cannot be evaluated either because it does not exist | ||||
| or because the client does not have permission to evaluate the | ||||
| component, then an error will be returned and the current filehandle | ||||
| will be unchanged. | ||||
| </t> | ||||
| <t> | ||||
| If the component is a zero-length string or if any component does not | ||||
| obey the UTF-8 definition, the error NFS4ERR_INVAL will be returned. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOOKUP_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| If the client wants to achieve the effect of a multi-component look up, | ||||
| it may construct a COMPOUND request such as (and obtain each | ||||
| filehandle): | ||||
| </t> | ||||
| <sourcecode type="nfsv4compound"><![CDATA[ | ||||
| PUTFH (directory filehandle) | ||||
| LOOKUP "pub" | ||||
| GETFH | ||||
| LOOKUP "foo" | ||||
| GETFH | ||||
| LOOKUP "bar" | ||||
| GETFH]]></sourcecode> | ||||
| <t> | ||||
| Unlike NFSv3, NFSv4.1 allows LOOKUP requests to cross mountpoints on the | ||||
| server. The client can detect a mountpoint crossing by comparing the | ||||
| fsid attribute of the directory with the fsid attribute of the | ||||
| directory looked up. If the fsids are different, then the new | ||||
| directory is a server mountpoint. UNIX clients that detect a | ||||
| mountpoint crossing will need to mount the server's file system. This | ||||
| needs to be done to maintain the file object identity checking | ||||
| mechanisms common to UNIX clients. | ||||
| </t> | ||||
| <t> | ||||
| Servers that limit NFS access to "shared" or "exported" file systems | ||||
| should provide a pseudo file system into which the exported file systems | ||||
| can be integrated, so that clients can browse the server's namespace. | ||||
| The clients view of a pseudo file system will be limited to paths that | ||||
| lead to exported file systems. | ||||
| </t> | ||||
| <t> | ||||
| Note: previous versions of the protocol assigned special semantics to | ||||
| the names "." and "..". NFSv4.1 assigns no special semantics to | ||||
| these names. The LOOKUPP operator must be used to look up a parent | ||||
| directory. | ||||
| </t> | ||||
| <t> | ||||
| Note that this operation does not follow symbolic links. The client | ||||
| is responsible for all parsing of filenames including filenames that | ||||
| are modified by symbolic links encountered during the look up process. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle supplied is not a directory but a symbolic | ||||
| link, the error NFS4ERR_SYMLINK is returned as the error. For all | ||||
| other non-directory file types, the error NFS4ERR_NOTDIR is returned. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_LOOKUPP" numbered="true" toc="default"> | ||||
| <name>Operation 16: LOOKUPP - Lookup Parent Directory</name> | ||||
| <section toc="exclude" anchor="OP_LOOKUPP_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* CURRENT_FH: object */ | ||||
| void;]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOOKUPP_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct LOOKUPP4res { | ||||
| /* new CURRENT_FH: parent directory */ | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOOKUPP_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The current filehandle is assumed to refer to a regular | ||||
| directory or a named attribute directory. LOOKUPP assigns the | ||||
| filehandle for its parent directory to be the current | ||||
| filehandle. If there is no parent directory, an NFS4ERR_NOENT | ||||
| error must be returned. Therefore, NFS4ERR_NOENT will be | ||||
| returned by the server when the current filehandle is at the | ||||
| root or top of the server's file tree. | ||||
| </t> | ||||
| <t> | ||||
| As is the case with LOOKUP, LOOKUPP will also cross mountpoints. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle is not a directory or named attribute | ||||
| directory, the error NFS4ERR_NOTDIR is returned. | ||||
| </t> | ||||
| <t> | ||||
| If the requester's security flavor does not match that | ||||
| configured for the parent directory, then the server <bcp14>SHOULD</bcp14> | ||||
| return NFS4ERR_WRONGSEC (a future minor revision of NFSv4 may | ||||
| upgrade this to <bcp14>MUST</bcp14>) in the LOOKUPP response. However, if the | ||||
| server does so, it <bcp14>MUST</bcp14> support the SECINFO_NO_NAME | ||||
| operation (<xref target="OP_SECINFO_NO_NAME" format="default"/>), so that the client can gracefully determine the | ||||
| correct security flavor. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle is a named attribute directory that is | ||||
| associated with a file system object via OPENATTR (i.e., not a | ||||
| sub-directory of a named attribute directory), LOOKUPP <bcp14>SHOULD</bcp14> | ||||
| return the filehandle of the associated file system object. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LOOKUPP_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| An issue to note is upward navigation from named attribute | ||||
| directories. The named attribute directories are essentially | ||||
| detached from the namespace, and this property should be safely | ||||
| represented in the client operating environment. LOOKUPP on a | ||||
| named attribute directory may return the filehandle of the | ||||
| associated file, and conveying this to applications might be | ||||
| unsafe as many applications expect the parent of an object to | ||||
| always be a directory. Therefore, the client may want to hide | ||||
| the parent of named attribute directories (represented as ".." | ||||
| in UNIX) or represent the named attribute directory as its own | ||||
| parent (as is typically done for the file system root directory in | ||||
| UNIX). | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_NVERIFY" numbered="true" toc="default"> | ||||
| <name>Operation 17: NVERIFY - Verify Difference in Attributes</name> | ||||
| <section toc="exclude" anchor="OP_NVERIFY_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct NVERIFY4args { | ||||
| /* CURRENT_FH: object */ | ||||
| fattr4 obj_attributes; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_NVERIFY_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct NVERIFY4res { | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_NVERIFY_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation is used to prefix a sequence of operations to be | ||||
| performed if one or more attributes have changed on some file system | ||||
| object. If all the attributes match, then the error NFS4ERR_SAME <bcp14>MUST</bcp14> | ||||
| be returned. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_NVERIFY_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| This operation is useful as a cache validation operator. If the | ||||
| object to which the attributes belong has changed, then the following | ||||
| operations may obtain new data associated with that object, for | ||||
| instance, to check if a file has been changed and obtain new data if | ||||
| it has: | ||||
| </t> | ||||
| <sourcecode type="nfsv4compound"><![CDATA[ | ||||
| SEQUENCE | ||||
| PUTFH fh | ||||
| NVERIFY attrbits attrs | ||||
| READ 0 32767]]></sourcecode> | ||||
| <t> | ||||
| Contrast this with NFSv3, which would first send a GETATTR in | ||||
| one request/reply round trip, and then if attributes indicated that | ||||
| the client's cache was stale, then send a READ in another request/reply | ||||
| round trip. | ||||
| </t> | ||||
| <t> | ||||
| In the case that a <bcp14>RECOMMENDED</bcp14> attribute is specified in the NVERIFY | ||||
| operation and the server does not support that attribute for the | ||||
| file system object, the error NFS4ERR_ATTRNOTSUPP is returned to the | ||||
| client. | ||||
| </t> | ||||
| <t> | ||||
| When the attribute rdattr_error or any set-only attribute (e.g., | ||||
| time_modify_set) is specified, the error NFS4ERR_INVAL is returned to | ||||
| the client. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_OPEN" numbered="true" toc="default"> | ||||
| <name>Operation 18: OPEN - Open a Regular File</name> | ||||
| <section toc="exclude" anchor="OP_OPEN_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* | ||||
| * Various definitions for OPEN | ||||
| */ | ||||
| enum createmode4 { | ||||
| UNCHECKED4 = 0, | ||||
| GUARDED4 = 1, | ||||
| /* Deprecated in NFSv4.1. */ | ||||
| EXCLUSIVE4 = 2, | ||||
| /* | ||||
| * New to NFSv4.1. If session is persistent, | ||||
| * GUARDED4 MUST be used. Otherwise, use | ||||
| * EXCLUSIVE4_1 instead of EXCLUSIVE4. | ||||
| */ | ||||
| EXCLUSIVE4_1 = 3 | ||||
| }; | ||||
| struct creatverfattr { | ||||
| verifier4 cva_verf; | ||||
| fattr4 cva_attrs; | ||||
| }; | ||||
| union createhow4 switch (createmode4 mode) { | ||||
| case UNCHECKED4: | ||||
| case GUARDED4: | ||||
| fattr4 createattrs; | ||||
| case EXCLUSIVE4: | ||||
| verifier4 createverf; | ||||
| case EXCLUSIVE4_1: | ||||
| creatverfattr ch_createboth; | ||||
| }; | ||||
| enum opentype4 { | ||||
| OPEN4_NOCREATE = 0, | ||||
| OPEN4_CREATE = 1 | ||||
| }; | ||||
| union openflag4 switch (opentype4 opentype) { | ||||
| case OPEN4_CREATE: | ||||
| createhow4 how; | ||||
| default: | ||||
| void; | ||||
| }; | ||||
| /* Next definitions used for OPEN delegation */ | ||||
| enum limit_by4 { | ||||
| NFS_LIMIT_SIZE = 1, | ||||
| NFS_LIMIT_BLOCKS = 2 | ||||
| /* others as needed */ | ||||
| }; | ||||
| struct nfs_modified_limit4 { | ||||
| uint32_t num_blocks; | ||||
| uint32_t bytes_per_block; | ||||
| }; | ||||
| union nfs_space_limit4 switch (limit_by4 limitby) { | ||||
| /* limit specified as file size */ | ||||
| case NFS_LIMIT_SIZE: | ||||
| uint64_t filesize; | ||||
| /* limit specified by number of blocks */ | ||||
| case NFS_LIMIT_BLOCKS: | ||||
| nfs_modified_limit4 mod_blocks; | ||||
| } ; | ||||
| /* | ||||
| * Share Access and Deny constants for open argument | ||||
| */ | ||||
| const OPEN4_SHARE_ACCESS_READ = 0x00000001; | ||||
| const OPEN4_SHARE_ACCESS_WRITE = 0x00000002; | ||||
| const OPEN4_SHARE_ACCESS_BOTH = 0x00000003; | ||||
| const OPEN4_SHARE_DENY_NONE = 0x00000000; | ||||
| const OPEN4_SHARE_DENY_READ = 0x00000001; | ||||
| const OPEN4_SHARE_DENY_WRITE = 0x00000002; | ||||
| const OPEN4_SHARE_DENY_BOTH = 0x00000003; | ||||
| /* new flags for share_access field of OPEN4args */ | ||||
| const OPEN4_SHARE_ACCESS_WANT_DELEG_MASK = 0xFF00; | ||||
| const OPEN4_SHARE_ACCESS_WANT_NO_PREFERENCE = 0x0000; | ||||
| const OPEN4_SHARE_ACCESS_WANT_READ_DELEG = 0x0100; | ||||
| const OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG = 0x0200; | ||||
| const OPEN4_SHARE_ACCESS_WANT_ANY_DELEG = 0x0300; | ||||
| const OPEN4_SHARE_ACCESS_WANT_NO_DELEG = 0x0400; | ||||
| const OPEN4_SHARE_ACCESS_WANT_CANCEL = 0x0500; | ||||
| const | ||||
| OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL | ||||
| = 0x10000; | ||||
| const | ||||
| OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED | ||||
| = 0x20000; | ||||
| enum open_delegation_type4 { | ||||
| OPEN_DELEGATE_NONE = 0, | ||||
| OPEN_DELEGATE_READ = 1, | ||||
| OPEN_DELEGATE_WRITE = 2, | ||||
| OPEN_DELEGATE_NONE_EXT = 3 /* new to v4.1 */ | ||||
| }; | ||||
| enum open_claim_type4 { | ||||
| /* | ||||
| * Not a reclaim. | ||||
| */ | ||||
| CLAIM_NULL = 0, | ||||
| CLAIM_PREVIOUS = 1, | ||||
| CLAIM_DELEGATE_CUR = 2, | ||||
| CLAIM_DELEGATE_PREV = 3, | ||||
| /* | ||||
| * Not a reclaim. | ||||
| * | ||||
| * Like CLAIM_NULL, but object identified | ||||
| * by the current filehandle. | ||||
| */ | ||||
| CLAIM_FH = 4, /* new to v4.1 */ | ||||
| /* | ||||
| * Like CLAIM_DELEGATE_CUR, but object identified | ||||
| * by current filehandle. | ||||
| */ | ||||
| CLAIM_DELEG_CUR_FH = 5, /* new to v4.1 */ | ||||
| /* | ||||
| * Like CLAIM_DELEGATE_PREV, but object identified | ||||
| * by current filehandle. | ||||
| */ | ||||
| CLAIM_DELEG_PREV_FH = 6 /* new to v4.1 */ | ||||
| }; | ||||
| struct open_claim_delegate_cur4 { | ||||
| stateid4 delegate_stateid; | ||||
| component4 file; | ||||
| }; | ||||
| union open_claim4 switch (open_claim_type4 claim) { | ||||
| /* | ||||
| * No special rights to file. | ||||
| * Ordinary OPEN of the specified file. | ||||
| */ | ||||
| case CLAIM_NULL: | ||||
| /* CURRENT_FH: directory */ | ||||
| component4 file; | ||||
| /* | ||||
| * Right to the file established by an | ||||
| * open previous to server reboot. File | ||||
| * identified by filehandle obtained at | ||||
| * that time rather than by name. | ||||
| */ | ||||
| case CLAIM_PREVIOUS: | ||||
| /* CURRENT_FH: file being reclaimed */ | ||||
| open_delegation_type4 delegate_type; | ||||
| /* | ||||
| * Right to file based on a delegation | ||||
| * granted by the server. File is | ||||
| * specified by name. | ||||
| */ | ||||
| case CLAIM_DELEGATE_CUR: | ||||
| /* CURRENT_FH: directory */ | ||||
| open_claim_delegate_cur4 delegate_cur_info; | ||||
| /* | ||||
| * Right to file based on a delegation | ||||
| * granted to a previous boot instance | ||||
| * of the client. File is specified by name. | ||||
| */ | ||||
| case CLAIM_DELEGATE_PREV: | ||||
| /* CURRENT_FH: directory */ | ||||
| component4 file_delegate_prev; | ||||
| /* | ||||
| * Like CLAIM_NULL. No special rights | ||||
| * to file. Ordinary OPEN of the | ||||
| * specified file by current filehandle. | ||||
| */ | ||||
| case CLAIM_FH: /* new to v4.1 */ | ||||
| /* CURRENT_FH: regular file to open */ | ||||
| void; | ||||
| /* | ||||
| * Like CLAIM_DELEGATE_PREV. Right to file based on a | ||||
| * delegation granted to a previous boot | ||||
| * instance of the client. File is identified | ||||
| * by filehandle. | ||||
| */ | ||||
| case CLAIM_DELEG_PREV_FH: /* new to v4.1 */ | ||||
| /* CURRENT_FH: file being opened */ | ||||
| void; | ||||
| /* | ||||
| * Like CLAIM_DELEGATE_CUR. Right to file based on | ||||
| * a delegation granted by the server. | ||||
| * File is identified by filehandle. | ||||
| */ | ||||
| case CLAIM_DELEG_CUR_FH: /* new to v4.1 */ | ||||
| /* CURRENT_FH: file being opened */ | ||||
| stateid4 oc_delegate_stateid; | ||||
| }; | ||||
| /* | ||||
| * OPEN: Open a file, potentially receiving an OPEN delegation | ||||
| */ | ||||
| struct OPEN4args { | ||||
| seqid4 seqid; | ||||
| uint32_t share_access; | ||||
| uint32_t share_deny; | ||||
| open_owner4 owner; | ||||
| openflag4 openhow; | ||||
| open_claim4 claim; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_OPEN_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct open_read_delegation4 { | ||||
| stateid4 stateid; /* Stateid for delegation*/ | ||||
| bool recall; /* Pre-recalled flag for | ||||
| delegations obtained | ||||
| by reclaim (CLAIM_PREVIOUS) */ | ||||
| nfsace4 permissions; /* Defines users who don't | ||||
| need an ACCESS call to | ||||
| open for read */ | ||||
| }; | ||||
| struct open_write_delegation4 { | ||||
| stateid4 stateid; /* Stateid for delegation */ | ||||
| bool recall; /* Pre-recalled flag for | ||||
| delegations obtained | ||||
| by reclaim | ||||
| (CLAIM_PREVIOUS) */ | ||||
| nfs_space_limit4 | ||||
| space_limit; /* Defines condition that | ||||
| the client must check to | ||||
| determine whether the | ||||
| file needs to be flushed | ||||
| to the server on close. */ | ||||
| nfsace4 permissions; /* Defines users who don't | ||||
| need an ACCESS call as | ||||
| part of a delegated | ||||
| open. */ | ||||
| }; | ||||
| enum why_no_delegation4 { /* new to v4.1 */ | ||||
| WND4_NOT_WANTED = 0, | ||||
| WND4_CONTENTION = 1, | ||||
| WND4_RESOURCE = 2, | ||||
| WND4_NOT_SUPP_FTYPE = 3, | ||||
| WND4_WRITE_DELEG_NOT_SUPP_FTYPE = 4, | ||||
| WND4_NOT_SUPP_UPGRADE = 5, | ||||
| WND4_NOT_SUPP_DOWNGRADE = 6, | ||||
| WND4_CANCELLED = 7, | ||||
| WND4_IS_DIR = 8 | ||||
| }; | ||||
| union open_none_delegation4 /* new to v4.1 */ | ||||
| switch (why_no_delegation4 ond_why) { | ||||
| case WND4_CONTENTION: | ||||
| bool ond_server_will_push_deleg; | ||||
| case WND4_RESOURCE: | ||||
| bool ond_server_will_signal_avail; | ||||
| default: | ||||
| void; | ||||
| }; | ||||
| union open_delegation4 | ||||
| switch (open_delegation_type4 delegation_type) { | ||||
| case OPEN_DELEGATE_NONE: | ||||
| void; | ||||
| case OPEN_DELEGATE_READ: | ||||
| open_read_delegation4 read; | ||||
| case OPEN_DELEGATE_WRITE: | ||||
| open_write_delegation4 write; | ||||
| case OPEN_DELEGATE_NONE_EXT: /* new to v4.1 */ | ||||
| open_none_delegation4 od_whynone; | ||||
| }; | ||||
| /* | ||||
| * Result flags | ||||
| */ | ||||
| /* Client must confirm open */ | ||||
| const OPEN4_RESULT_CONFIRM = 0x00000002; | ||||
| /* Type of file locking behavior at the server */ | ||||
| const OPEN4_RESULT_LOCKTYPE_POSIX = 0x00000004; | ||||
| /* Server will preserve file if removed while open */ | ||||
| const OPEN4_RESULT_PRESERVE_UNLINKED = 0x00000008; | ||||
| /* | ||||
| * Server may use CB_NOTIFY_LOCK on locks | ||||
| * derived from this open | ||||
| */ | ||||
| const OPEN4_RESULT_MAY_NOTIFY_LOCK = 0x00000020; | ||||
| struct OPEN4resok { | ||||
| stateid4 stateid; /* Stateid for open */ | ||||
| change_info4 cinfo; /* Directory Change Info */ | ||||
| uint32_t rflags; /* Result flags */ | ||||
| bitmap4 attrset; /* attribute set for create*/ | ||||
| open_delegation4 delegation; /* Info on any open | ||||
| delegation */ | ||||
| }; | ||||
| union OPEN4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| /* New CURRENT_FH: opened file */ | ||||
| OPEN4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_OPEN_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The OPEN operation opens a regular file in a | ||||
| directory with the provided name or filehandle. | ||||
| OPEN can also create a file if a name is provided, | ||||
| and the client specifies it wants to create a file. | ||||
| Specification of whether or not a file is to be created, | ||||
| and the method of creation is via the openhow | ||||
| parameter. The openhow parameter consists of | ||||
| a switched union (data type opengflag4), which | ||||
| switches on the value of opentype (OPEN4_NOCREATE | ||||
| or OPEN4_CREATE). If OPEN4_CREATE is specified, | ||||
| this leads to another switched union (data type | ||||
| createhow4) that supports four cases of creation | ||||
| methods: UNCHECKED4, GUARDED4, EXCLUSIVE4, | ||||
| or EXCLUSIVE4_1. If opentype is OPEN4_CREATE, | ||||
| then the claim field of the claim field | ||||
| <bcp14>MUST</bcp14> be one of CLAIM_NULL, CLAIM_DELEGATE_CUR, or | ||||
| CLAIM_DELEGATE_PREV, because these claim methods | ||||
| include a component of a file name. | ||||
| </t> | ||||
| <t> | ||||
| Upon success (which might entail creation of a new | ||||
| file), the current filehandle is replaced by that | ||||
| of the created or existing object. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle is a named attribute | ||||
| directory, OPEN will then create or open a named | ||||
| attribute file. Note that exclusive create | ||||
| of a named attribute is not supported. If the | ||||
| createmode is EXCLUSIVE4 or EXCLUSIVE4_1 and the | ||||
| current filehandle is a named attribute directory, | ||||
| the server will return EINVAL. | ||||
| </t> | ||||
| <t> | ||||
| UNCHECKED4 means that the file should be created if a | ||||
| file of that name does not exist and encountering an | ||||
| existing regular file of that name is not an error. | ||||
| For this type of create, createattrs specifies the | ||||
| initial set of attributes for the file. The set | ||||
| of attributes may include any writable attribute | ||||
| valid for regular files. When an UNCHECKED4 | ||||
| create encounters an existing file, the attributes | ||||
| specified by createattrs are not used, except that | ||||
| when createattrs specifies the size attribute | ||||
| with a size of zero, the existing file is truncated. | ||||
| </t> | ||||
| <t> | ||||
| If GUARDED4 is specified, the server checks for | ||||
| the presence of a duplicate object by name before | ||||
| performing the create. If a duplicate exists, | ||||
| NFS4ERR_EXIST is returned. | ||||
| If the object does not exist, the request is | ||||
| performed as described for UNCHECKED4. | ||||
| </t> | ||||
| <t> | ||||
| For the UNCHECKED4 and GUARDED4 cases, where the | ||||
| operation is successful, the server will return | ||||
| to the client an attribute mask signifying which | ||||
| attributes were successfully set for the object. | ||||
| </t> | ||||
| <t> | ||||
| EXCLUSIVE4_1 and EXCLUSIVE4 | ||||
| specify that the server is to follow exclusive | ||||
| creation semantics, using the verifier to ensure | ||||
| exclusive creation of the target. The server should | ||||
| check for the presence of a duplicate object by name. | ||||
| If the object does not exist, the server creates | ||||
| the object and stores the verifier with the object. | ||||
| If the object does exist and the stored verifier | ||||
| matches the client provided verifier, the server | ||||
| uses the existing object as the newly created object. | ||||
| If the stored verifier does not match, then an error | ||||
| of NFS4ERR_EXIST is returned. | ||||
| </t> | ||||
| <t> | ||||
| If using EXCLUSIVE4, and if the server uses attributes to | ||||
| store the exclusive create verifier, the server will signify | ||||
| which attributes it used by setting the appropriate bits in | ||||
| the attribute mask that is returned in the results. | ||||
| Unlike UNCHECKED4, GUARDED4, and EXCLUSIVE4_1, EXCLUSIVE4 does | ||||
| not support the setting of attributes at file creation, and | ||||
| after a successful OPEN via EXCLUSIVE4, the client <bcp14>MUST</bcp14> | ||||
| send a SETATTR to set attributes to a known state. | ||||
| </t> | ||||
| <t> | ||||
| In NFSv4.1, EXCLUSIVE4 has been deprecated in favor | ||||
| of EXCLUSIVE4_1. | ||||
| Unlike EXCLUSIVE4, attributes may be provided | ||||
| in the EXCLUSIVE4_1 case, but because the server | ||||
| may use attributes of the target object to store | ||||
| the verifier, the set of allowable attributes | ||||
| may be fewer than the set of attributes SETATTR | ||||
| allows. The allowable attributes for EXCLUSIVE4_1 | ||||
| are indicated in the suppattr_exclcreat (<xref target="attrdef_suppattr_exclcreat" format="default"/>) attribute. If the client | ||||
| attempts to set in cva_attrs an attribute that is not in | ||||
| suppattr_exclcreat, the server <bcp14>MUST</bcp14> return NFS4ERR_INVAL. | ||||
| The response field, attrset, indicates both which attributes | ||||
| the server set from cva_attrs and which attributes the | ||||
| server used to store the verifier. As described | ||||
| in <xref target="OP_OPEN_IMPLEMENTATION" format="default"/>, the client can compare | ||||
| cva_attrs.attrmask with attrset to determine which attributes | ||||
| were used to store the verifier. | ||||
| </t> | ||||
| <t> | ||||
| With the addition of persistent sessions and | ||||
| pNFS, under some conditions EXCLUSIVE4 <bcp14>MUST NOT</bcp14> | ||||
| be used by the client or supported by the server. | ||||
| The following table summarizes the appropriate and | ||||
| mandated exclusive create methods for implementations | ||||
| of NFSv4.1: | ||||
| </t> | ||||
| <table anchor="exclusive_create" align="center"> | ||||
| <name>Required Methods for Exclusive Create</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Persistent Reply Cache Enabled</th> | ||||
| <th align="left">Server Supports pNFS</th> | ||||
| <th align="left">Server <bcp14>REQUIRED</bcp14></th> | ||||
| <th align="left">Client Allowed</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">no</td> | ||||
| <td align="left">no</td> | ||||
| <td align="left">EXCLUSIVE4_1 and EXCLUSIVE4</td> | ||||
| <td align="left">EXCLUSIVE4_1 (<bcp14>SHOULD</bcp14>) or EXCLUSIVE4 (<bcp14>SHOULD NOT</bcp14>)</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">no</td> | ||||
| <td align="left">yes</td> | ||||
| <td align="left">EXCLUSIVE4_1</td> | ||||
| <td align="left">EXCLUSIVE4_1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">yes</td> | ||||
| <td align="left">no</td> | ||||
| <td align="left">GUARDED4</td> | ||||
| <td align="left">GUARDED4</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">yes</td> | ||||
| <td align="left">yes</td> | ||||
| <td align="left">GUARDED4</td> | ||||
| <td align="left">GUARDED4</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> | ||||
| If CREATE_SESSION4_FLAG_PERSIST is set in the results | ||||
| of CREATE_SESSION, the reply cache is persistent (see <xref target="OP_CREATE_SESSION" format="default"/>). | ||||
| If the EXCHGID4_FLAG_USE_PNFS_MDS flag is set in the | ||||
| results from EXCHANGE_ID, the server is a pNFS server (see <xref target="OP_EXCHANGE_ID" format="default"/>). | ||||
| If the client attempts to use EXCLUSIVE4 on a persistent session, | ||||
| or a session derived from an | ||||
| EXCHGID4_FLAG_USE_PNFS_MDS client ID, the server <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_INVAL. | ||||
| </t> | ||||
| <t> | ||||
| With persistent sessions, exclusive create semantics | ||||
| are fully achievable via GUARDED4, and so EXCLUSIVE4 | ||||
| or EXCLUSIVE4_1 <bcp14>MUST NOT</bcp14> be used. When pNFS is | ||||
| being used, the layout_hint attribute might | ||||
| not be supported after the file is created. Only the | ||||
| EXCLUSIVE4_1 and GUARDED methods of exclusive file | ||||
| creation allow the atomic setting of attributes. | ||||
| </t> | ||||
| <t> | ||||
| For the target directory, the server returns change_info4 information | ||||
| in cinfo. With the atomic field of the change_info4 data type, the | ||||
| server will indicate if the before and after change attributes were | ||||
| obtained atomically with respect to the link creation. | ||||
| </t> | ||||
| <t> | ||||
| The OPEN operation provides for Windows share | ||||
| reservation capability with the use of the | ||||
| share_access and share_deny fields of the OPEN | ||||
| arguments. The client specifies at OPEN the required | ||||
| share_access and share_deny modes. For clients | ||||
| that do not directly support SHAREs (i.e., UNIX), the | ||||
| expected deny value is OPEN4_SHARE_DENY_NONE. In the case that | ||||
| there is an existing SHARE reservation that conflicts | ||||
| with the OPEN request, the server returns the error | ||||
| NFS4ERR_SHARE_DENIED. For additional discussion of | ||||
| SHARE semantics, see <xref target="share_reserve" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| For each OPEN, the client provides a value for | ||||
| the owner field of the OPEN argument. The owner | ||||
| field is of data type open_owner4, and contains a | ||||
| field called clientid and a field called owner. The | ||||
| client can set the clientid field to any value and | ||||
| the server <bcp14>MUST</bcp14> ignore it. Instead, the server <bcp14>MUST</bcp14> | ||||
| derive the client ID from the session ID of the | ||||
| SEQUENCE operation of the COMPOUND request. | ||||
| </t> | ||||
| <t> | ||||
| The "seqid" field of the request is not used in | ||||
| NFSv4.1, but it <bcp14>MAY</bcp14> be any value and the server <bcp14>MUST</bcp14> | ||||
| ignore it. | ||||
| </t> | ||||
| <t> | ||||
| In the case that the client is recovering state from a server failure, | ||||
| the claim field of the OPEN argument is used to signify that the | ||||
| request is meant to reclaim state previously held. | ||||
| </t> | ||||
| <t> | ||||
| The "claim" field of the OPEN argument is used to specify the file to | ||||
| be opened and the state information that the client claims to | ||||
| possess. There are seven claim types as follows: | ||||
| </t> | ||||
| <table align="center"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">open type</th> | ||||
| <th align="left">description</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left"> | ||||
| CLAIM_NULL, | ||||
| CLAIM_FH | ||||
| </td> | ||||
| <td align="left"> | ||||
| For the client, this is a new OPEN request and there is no | ||||
| previous state associated with the file for the client. With | ||||
| CLAIM_NULL, the file is identified by the current filehandle | ||||
| and the specified component name. With CLAIM_FH (new to NFSv4.1), | ||||
| the file is identified by just the current filehandle. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> | ||||
| CLAIM_PREVIOUS | ||||
| </td> | ||||
| <td align="left"> | ||||
| The client is claiming basic OPEN state for a file that was held | ||||
| previous to a server restart. Generally used when a server is | ||||
| returning persistent filehandles; the client may not have the file | ||||
| name to reclaim the OPEN. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> | ||||
| CLAIM_DELEGATE_CUR, | ||||
| CLAIM_DELEG_CUR_FH | ||||
| </td> | ||||
| <td align="left"> | ||||
| The client is claiming a delegation for OPEN | ||||
| as granted by the server. Generally, this | ||||
| is done as part of recalling a delegation. With | ||||
| CLAIM_DELEGATE_CUR, the file is identified by | ||||
| the current filehandle and the specified component | ||||
| name. With CLAIM_DELEG_CUR_FH (new to NFSv4.1), the | ||||
| file is identified by just the current filehandle. | ||||
| </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left"> | ||||
| CLAIM_DELEGATE_PREV, | ||||
| CLAIM_DELEG_PREV_FH | ||||
| </td> | ||||
| <td align="left"> | ||||
| The client is claiming a delegation granted to a | ||||
| previous client instance; used after the client | ||||
| restarts. The server <bcp14>MAY</bcp14> support CLAIM_DELEGATE_PREV | ||||
| and/or CLAIM_DELEG_PREV_FH (new to NFSv4.1). If it | ||||
| does support either claim type, CREATE_SESSION <bcp14>MUST | ||||
| NOT</bcp14> remove the client's delegation state, and the | ||||
| server <bcp14>MUST</bcp14> support the DELEGPURGE operation. | ||||
| </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> | ||||
| For OPEN requests that reach the server during | ||||
| the grace period, the server returns an error | ||||
| of NFS4ERR_GRACE. The following claim types are | ||||
| exceptions: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| OPEN requests specifying the claim type CLAIM_PREVIOUS are devoted to | ||||
| reclaiming opens after a server restart and are typically only | ||||
| valid during the grace period. | ||||
| </li> | ||||
| <li> | ||||
| OPEN requests specifying the claim types CLAIM_DELEGATE_CUR and | ||||
| CLAIM_DELEG_CUR_FH are valid both during and after the grace period. | ||||
| Since the granting of the delegation that they are subordinate | ||||
| to assures that there is no conflict with locks to be reclaimed | ||||
| by other clients, the server need not return NFS4ERR_GRACE when | ||||
| these are received during the grace period. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| For any OPEN request, the server may return an OPEN delegation, which | ||||
| allows further opens and closes to be handled locally on the client as | ||||
| described in <xref target="open_delegation" format="default"/>. Note that delegation is | ||||
| up to the server to decide. The client should never assume that | ||||
| delegation will or will not be granted in a particular instance. It | ||||
| should always be prepared for either case. A partial exception is the | ||||
| reclaim (CLAIM_PREVIOUS) case, in which a delegation type is claimed. | ||||
| In this case, delegation will always be granted, although the server | ||||
| may specify an immediate recall in the delegation structure. | ||||
| </t> | ||||
| <t> | ||||
| The rflags returned by a successful OPEN allow the server to return | ||||
| information governing how the open file is to be handled. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| OPEN4_RESULT_CONFIRM is deprecated and <bcp14>MUST NOT</bcp14> be returned | ||||
| by an NFSv4.1 server. | ||||
| </li> | ||||
| <li> | ||||
| OPEN4_RESULT_LOCKTYPE_POSIX indicates that the server's byte-range locking | ||||
| behavior supports the complete set of POSIX locking techniques <xref target="fcntl" format="default"/>. From | ||||
| this, the client can choose to manage byte-range locking state in a way to | ||||
| handle a mismatch of byte-range locking management. | ||||
| </li> | ||||
| <li> | ||||
| OPEN4_RESULT_PRESERVE_UNLINKED indicates that the server will | ||||
| preserve the open file if the client (or any other client) | ||||
| removes the file as long as it is open. Furthermore, the | ||||
| server promises to preserve the file through the | ||||
| grace period after server restart, thereby giving the client | ||||
| the opportunity to reclaim its open. | ||||
| </li> | ||||
| <li> | ||||
| OPEN4_RESULT_MAY_NOTIFY_LOCK indicates that the server may attempt | ||||
| CB_NOTIFY_LOCK callbacks for locks on this file. This flag is a hint | ||||
| only, and may be safely ignored by the client. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If the component is of zero length, NFS4ERR_INVAL will be returned. | ||||
| The component is also subject to the normal UTF-8, character support, | ||||
| and name checks. See <xref target="utf8_related_errors" format="default"/> for | ||||
| further discussion. | ||||
| </t> | ||||
| <t> | ||||
| When an OPEN is done and the specified open-owner already has the | ||||
| resulting filehandle open, the result is to "OR" together the new | ||||
| share and deny status together with the existing status. In this | ||||
| case, only a single CLOSE need be done, even though multiple OPENs | ||||
| were completed. When such an OPEN is done, checking of share | ||||
| reservations for the new OPEN proceeds normally, with no exception for | ||||
| the existing OPEN held by the same open-owner. In this case, the | ||||
| stateid returned as an "other" field that matches that of the previous | ||||
| open while the "seqid" field is incremented to reflect the change | ||||
| status due to the new open. | ||||
| </t> | ||||
| <t> | ||||
| If the underlying file system at the server is only accessible in a | ||||
| read-only mode and the OPEN request has specified ACCESS_WRITE or | ||||
| ACCESS_BOTH, the server will return NFS4ERR_ROFS to indicate a | ||||
| read-only file system. | ||||
| </t> | ||||
| <t> | ||||
| As with the CREATE operation, the server <bcp14>MUST</bcp14> derive | ||||
| the owner, owner ACE, group, or group ACE if any | ||||
| of the four attributes are required and supported | ||||
| by the server's file system. For an OPEN with the | ||||
| EXCLUSIVE4 createmode, the server has no choice, | ||||
| since such OPEN calls do not include the createattrs | ||||
| field. Conversely, if createattrs (UNCHECKED4 or | ||||
| GUARDED4) or cva_attrs (EXCLUSIVE4_1) is specified, | ||||
| and includes an owner, owner_group, or ACE that | ||||
| the principal in the RPC call's credentials does | ||||
| not have authorization to create files for, then | ||||
| the server may return NFS4ERR_PERM. | ||||
| </t> | ||||
| <t> | ||||
| In the case of an OPEN that specifies a size of zero (e.g., truncation) | ||||
| and the file has named attributes, the named attributes are left as | ||||
| is and are not removed. | ||||
| </t> | ||||
| <t> | ||||
| NFSv4.1 gives more precise control to clients over | ||||
| acquisition of delegations via the following new | ||||
| flags for the share_access field of OPEN4args: | ||||
| </t> | ||||
| <t>OPEN4_SHARE_ACCESS_WANT_READ_DELEG</t> | ||||
| <t>OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG</t> | ||||
| <t>OPEN4_SHARE_ACCESS_WANT_ANY_DELEG</t> | ||||
| <t>OPEN4_SHARE_ACCESS_WANT_NO_DELEG</t> | ||||
| <t>OPEN4_SHARE_ACCESS_WANT_CANCEL</t> | ||||
| <t>OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL</t> | ||||
| <t>OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED</t> | ||||
| <t> | ||||
| If (share_access & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) is | ||||
| not zero, then the client will have specified one and only one of: | ||||
| </t> | ||||
| <t>OPEN4_SHARE_ACCESS_WANT_READ_DELEG</t> | ||||
| <t>OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG</t> | ||||
| <t>OPEN4_SHARE_ACCESS_WANT_ANY_DELEG</t> | ||||
| <t>OPEN4_SHARE_ACCESS_WANT_NO_DELEG</t> | ||||
| <t>OPEN4_SHARE_ACCESS_WANT_CANCEL</t> | ||||
| <t> | ||||
| Otherwise, the client is neither indicating a desire nor a non-desire | ||||
| for a delegation, and the server <bcp14>MAY</bcp14> or | ||||
| <bcp14>MAY</bcp14> not return a delegation | ||||
| in the OPEN response. | ||||
| </t> | ||||
| <t> | ||||
| If the server supports the new _WANT_ flags and the | ||||
| client sends one or more of the new flags, | ||||
| then in the event the server does not return a | ||||
| delegation, it <bcp14>MUST</bcp14> return a delegation type of | ||||
| OPEN_DELEGATE_NONE_EXT. The field ond_why in the reply | ||||
| indicates why | ||||
| no delegation was returned and will be one of: | ||||
| </t> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>WND4_NOT_WANTED</dt> | ||||
| <dd> | ||||
| The client specified OPEN4_SHARE_ACCESS_WANT_NO_DELEG. | ||||
| </dd> | ||||
| <dt>WND4_CONTENTION</dt> | ||||
| <dd> | ||||
| There is a conflicting delegation or open on the file. | ||||
| </dd> | ||||
| <dt>WND4_RESOURCE</dt> | ||||
| <dd> | ||||
| Resource limitations prevent the server from granting a | ||||
| delegation. | ||||
| </dd> | ||||
| <dt>WND4_NOT_SUPP_FTYPE</dt> | ||||
| <dd> | ||||
| The server does not support delegations on this file type. | ||||
| </dd> | ||||
| <dt>WND4_WRITE_DELEG_NOT_SUPP_FTYPE</dt> | ||||
| <dd> | ||||
| The server does not support OPEN_DELEGATE_WRITE delegations on this file | ||||
| type. | ||||
| </dd> | ||||
| <dt>WND4_NOT_SUPP_UPGRADE</dt> | ||||
| <dd> | ||||
| The server does not support atomic upgrade of an OPEN_DELEGATE_READ delegation to an OPEN_DELEGATE_WRITE delegation. | ||||
| </dd> | ||||
| <dt>WND4_NOT_SUPP_DOWNGRADE</dt> | ||||
| <dd> | ||||
| The server does not support atomic downgrade of an OPEN_DELEGATE_WRITE delegation to an OPEN_DELEGATE_READ delegation. | ||||
| </dd> | ||||
| <dt>WND4_CANCELED</dt> | ||||
| <dd> | ||||
| The client specified OPEN4_SHARE_ACCESS_WANT_CANCEL and now | ||||
| any "want" for this file object is cancelled. | ||||
| </dd> | ||||
| <dt>WND4_IS_DIR</dt> | ||||
| <dd> | ||||
| The specified file object is a directory, and the operation | ||||
| is OPEN or WANT_DELEGATION, which do not support delegations | ||||
| on directories. | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| OPEN4_SHARE_ACCESS_WANT_READ_DELEG, | ||||
| OPEN_SHARE_ACCESS_WANT_WRITE_DELEG, or | ||||
| OPEN_SHARE_ACCESS_WANT_ANY_DELEG mean, respectively, the | ||||
| client wants an OPEN_DELEGATE_READ, OPEN_DELEGATE_WRITE, or any delegation regardless which | ||||
| of OPEN4_SHARE_ACCESS_READ, OPEN4_SHARE_ACCESS_WRITE, or | ||||
| OPEN4_SHARE_ACCESS_BOTH is set. If the client has an OPEN_DELEGATE_READ delegation on a file and requests an OPEN_DELEGATE_WRITE delegation, then | ||||
| the client is requesting atomic upgrade of its OPEN_DELEGATE_READ delegation | ||||
| to an OPEN_DELEGATE_WRITE delegation. If the client has an OPEN_DELEGATE_WRITE delegation on | ||||
| a file and requests an OPEN_DELEGATE_READ delegation, then the client is | ||||
| requesting atomic downgrade to an OPEN_DELEGATE_READ delegation. A server <bcp14>MAY</bcp14> | ||||
| support atomic upgrade or downgrade. If it does, then the | ||||
| returned delegation_type of OPEN_DELEGATE_READ | ||||
| or OPEN_DELEGATE_WRITE that is different from the delegation | ||||
| type the client currently has, indicates successful upgrade | ||||
| or downgrade. If the server does not support atomic delegation upgrade or | ||||
| downgrade, then ond_why will be set to WND4_NOT_SUPP_UPGRADE or | ||||
| WND4_NOT_SUPP_DOWNGRADE. | ||||
| </t> | ||||
| <t> | ||||
| OPEN4_SHARE_ACCESS_WANT_NO_DELEG means that the client wants no | ||||
| delegation. | ||||
| </t> | ||||
| <t> | ||||
| OPEN4_SHARE_ACCESS_WANT_CANCEL means that the client wants no | ||||
| delegation and wants to cancel any previously registered | ||||
| "want" for a delegation. | ||||
| </t> | ||||
| <t> | ||||
| The client may set one or both of | ||||
| OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL and | ||||
| OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED. | ||||
| However, they will have no effect unless one of following is set: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li>OPEN4_SHARE_ACCESS_WANT_READ_DELEG</li> | ||||
| <li>OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG</li> | ||||
| <li>OPEN4_SHARE_ACCESS_WANT_ANY_DELEG</li> | ||||
| </ul> | ||||
| <t> | ||||
| If the client specifies | ||||
| OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL, then it | ||||
| wishes to register a "want" for a delegation, in the event the | ||||
| OPEN results do not include a delegation. If so and the | ||||
| server denies the delegation due to insufficient resources, | ||||
| the server <bcp14>MAY</bcp14> later inform the client, via the | ||||
| CB_RECALLABLE_OBJ_AVAIL operation, that the resource | ||||
| limitation condition has eased. The server will tell the | ||||
| client that it intends to send a future | ||||
| CB_RECALLABLE_OBJ_AVAIL operation by setting delegation_type | ||||
| in the results to OPEN_DELEGATE_NONE_EXT, ond_why | ||||
| to WND4_RESOURCE, and ond_server_will_signal_avail set to | ||||
| TRUE. If | ||||
| ond_server_will_signal_avail is set to TRUE, the server <bcp14>MUST</bcp14> | ||||
| later send a CB_RECALLABLE_OBJ_AVAIL operation. | ||||
| </t> | ||||
| <t> | ||||
| If the client specifies | ||||
| OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_UNCONTENDED, then it | ||||
| wishes to register a "want" for a delegation, in the event the | ||||
| OPEN results do not include a delegation. If so and the server | ||||
| denies the delegation due to contention, the | ||||
| server <bcp14>MAY</bcp14> later inform the client, via the CB_PUSH_DELEG | ||||
| operation, that the contention condition | ||||
| has eased. The server will tell the client that it intends to | ||||
| send a future CB_PUSH_DELEG operation by setting | ||||
| delegation_type in the results to OPEN_DELEGATE_NONE_EXT, | ||||
| ond_why to WND4_CONTENTION, and | ||||
| ond_server_will_push_deleg to TRUE. If | ||||
| ond_server_will_push_deleg is TRUE, the server <bcp14>MUST</bcp14> later | ||||
| send a CB_PUSH_DELEG operation. | ||||
| </t> | ||||
| <t> | ||||
| If the client has previously registered a want for a | ||||
| delegation on a file, and then sends a request to register a | ||||
| want for a delegation on the same file, the server <bcp14>MUST</bcp14> return | ||||
| a new error: NFS4ERR_DELEG_ALREADY_WANTED. If the client | ||||
| wishes to register a different type of delegation want for the | ||||
| same file, it <bcp14>MUST</bcp14> cancel the existing delegation WANT. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_OPEN_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| In absence of a persistent session, the client | ||||
| invokes exclusive create by setting the how parameter | ||||
| to EXCLUSIVE4 or EXCLUSIVE4_1. In these cases, the | ||||
| client provides a verifier that can reasonably be | ||||
| expected to be unique. A combination of a client | ||||
| identifier, perhaps the client network address, | ||||
| and a unique number generated by the client, perhaps | ||||
| the RPC transaction identifier, may be appropriate. | ||||
| </t> | ||||
| <t> | ||||
| If the object does not exist, the server creates the object and stores the | ||||
| verifier in stable storage. For file systems that do not provide a | ||||
| mechanism for the storage of arbitrary file attributes, the server may | ||||
| use one or more elements of the object's metadata to store the | ||||
| verifier. The verifier <bcp14>MUST</bcp14> be stored in stable storage to prevent | ||||
| erroneous failure on retransmission of the request. It is assumed that | ||||
| an exclusive create is being performed because exclusive semantics are | ||||
| critical to the application. Because of the expected usage, exclusive | ||||
| CREATE does not rely solely on the server's reply cache | ||||
| for storage of the verifier. A nonpersistent reply cache | ||||
| does not survive a crash and the session and reply cache | ||||
| may be deleted after a network partition that exceeds the | ||||
| lease time, thus opening failure windows. | ||||
| </t> | ||||
| <t> | ||||
| An NFSv4.1 server <bcp14>SHOULD NOT</bcp14> store the verifier in | ||||
| any of the file's <bcp14>RECOMMENDED</bcp14> or <bcp14>REQUIRED</bcp14> attributes. | ||||
| If it does, the server <bcp14>SHOULD</bcp14> use time_modify_set or | ||||
| time_access_set to store the verifier. | ||||
| The server <bcp14>SHOULD NOT</bcp14> store the verifier in the | ||||
| following attributes: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li>acl (it is desirable for access control to | ||||
| be established at creation),</li> | ||||
| <li>dacl (ditto),</li> | ||||
| <li>mode (ditto),</li> | ||||
| <li>owner (ditto),</li> | ||||
| <li>owner_group (ditto),</li> | ||||
| <li>retentevt_set (it may be desired to | ||||
| establish retention at creation)</li> | ||||
| <li>retention_hold (ditto),</li> | ||||
| <li>retention_set (ditto),</li> | ||||
| <li>sacl (it is desirable for auditing control | ||||
| to be established at creation),</li> | ||||
| <li>size (on some servers, size may have a | ||||
| limited range of values),</li> | ||||
| <li> | ||||
| <t>mode_set_masked (as with mode), | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li>and</li> | ||||
| </ul> | ||||
| </li> | ||||
| <li>time_creation (a meaningful file creation | ||||
| should be set when the file is created).</li> | ||||
| </ul> | ||||
| <t> | ||||
| Another alternative for the server is to use a named attribute | ||||
| to store the verifier. | ||||
| </t> | ||||
| <t> | ||||
| Because the EXCLUSIVE4 create method does not specify | ||||
| initial attributes when processing an EXCLUSIVE4 create, | ||||
| the server | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <bcp14>SHOULD</bcp14> set the | ||||
| owner of the file to that corresponding to the credential of | ||||
| request's RPC header. | ||||
| </li> | ||||
| <li> | ||||
| <bcp14>SHOULD NOT</bcp14> leave the file's access control to anyone | ||||
| but the owner of the file. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If the server cannot support exclusive create | ||||
| semantics, possibly because of the requirement to | ||||
| commit the verifier to stable storage, it should fail | ||||
| the OPEN request with the error NFS4ERR_NOTSUPP. | ||||
| </t> | ||||
| <t> | ||||
| During an exclusive CREATE request, if the object | ||||
| already exists, the server reconstructs the object's | ||||
| verifier and compares it with the verifier in | ||||
| the request. If they match, the server treats the | ||||
| request as a success. The request is presumed to | ||||
| be a duplicate of an earlier, successful request | ||||
| for which the reply was lost and that the server | ||||
| duplicate request cache mechanism did not detect. If | ||||
| the verifiers do not match, the request is rejected | ||||
| with the status NFS4ERR_EXIST. | ||||
| </t> | ||||
| <t> | ||||
| After the client has performed a successful | ||||
| exclusive create, the attrset response indicates | ||||
| which attributes were used to store the verifier. | ||||
| If EXCLUSIVE4 was used, the attributes set in | ||||
| attrset were used for the verifier. If EXCLUSIVE4_1 | ||||
| was used, the client determines the attributes | ||||
| used for the verifier by comparing attrset with | ||||
| cva_attrs.attrmask; any bits set in the former but | ||||
| not the latter identify the attributes used to store | ||||
| the verifier. The client <bcp14>MUST</bcp14> immediately send a | ||||
| SETATTR to set attributes used to store the verifier. | ||||
| Until it does so, the attributes used to store the | ||||
| verifier cannot be relied upon. The subsequent | ||||
| SETATTR <bcp14>MUST NOT</bcp14> occur in the same COMPOUND request | ||||
| as the OPEN. | ||||
| </t> | ||||
| <t> | ||||
| Unless a persistent session is used, use of the | ||||
| GUARDED4 attribute does not provide exactly once | ||||
| semantics. In particular, if a reply is lost and | ||||
| the server does not detect the retransmission of the | ||||
| request, the operation can fail with NFS4ERR_EXIST, | ||||
| even though the create was performed successfully. | ||||
| The client would use this behavior in the case that | ||||
| the application has not requested an exclusive create | ||||
| but has asked to have the file truncated when the | ||||
| file is opened. In the case of the client timing | ||||
| out and retransmitting the create request, the client | ||||
| can use GUARDED4 to prevent against a sequence like | ||||
| create, write, create (retransmitted) from occurring. | ||||
| </t> | ||||
| <t> | ||||
| For SHARE reservations, the value of the expression | ||||
| (share_access & ~OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) <bcp14>MUST</bcp14> be | ||||
| one of OPEN4_SHARE_ACCESS_READ, OPEN4_SHARE_ACCESS_WRITE, | ||||
| or OPEN4_SHARE_ACCESS_BOTH. If not, the server <bcp14>MUST</bcp14> | ||||
| return NFS4ERR_INVAL. The value of share_deny <bcp14>MUST</bcp14> | ||||
| be one of OPEN4_SHARE_DENY_NONE, OPEN4_SHARE_DENY_READ, | ||||
| OPEN4_SHARE_DENY_WRITE, or OPEN4_SHARE_DENY_BOTH. If not, the | ||||
| server <bcp14>MUST</bcp14> return NFS4ERR_INVAL. | ||||
| </t> | ||||
| <t> | ||||
| Based on the share_access value (OPEN4_SHARE_ACCESS_READ, | ||||
| OPEN4_SHARE_ACCESS_WRITE, or OPEN4_SHARE_ACCESS_BOTH), the client | ||||
| should check that the requester has the proper access rights | ||||
| to perform the specified operation. This would generally be | ||||
| the results of applying the ACL access rules to the file for the | ||||
| current requester. However, just as with the ACCESS operation, the | ||||
| client should not attempt to second-guess the server's decisions, as | ||||
| access rights may change and may be subject to server administrative | ||||
| controls outside the ACL framework. If the requester's READ or | ||||
| WRITE operation is not authorized (depending on the share_access | ||||
| value), the server <bcp14>MUST</bcp14> return NFS4ERR_ACCESS. | ||||
| </t> | ||||
| <t> | ||||
| Note that if the client ID was not created | ||||
| with the EXCHGID4_FLAG_BIND_PRINC_STATEID capability set in | ||||
| the reply to EXCHANGE_ID, then the server <bcp14>MUST | ||||
| NOT</bcp14> impose any requirement that READs and WRITEs | ||||
| sent for an open file have the same credentials | ||||
| as the OPEN itself, and the server is <bcp14>REQUIRED</bcp14> to | ||||
| perform access checking on the READs and WRITEs | ||||
| themselves. Otherwise, if the reply to EXCHANGE_ID | ||||
| did have EXCHGID4_FLAG_BIND_PRINC_STATEID set, | ||||
| then with one exception, the credentials used in the OPEN request <bcp14>MUST</bcp14> | ||||
| match those used in the READs and WRITEs, and the | ||||
| stateids in the READs and WRITEs <bcp14>MUST</bcp14> match, or be | ||||
| derived from the stateid from the reply to OPEN. | ||||
| The exception is if SP4_SSV or SP4_MACH_CRED state | ||||
| protection is used, and the spo_must_allow | ||||
| result of EXCHANGE_ID includes the READ and/or WRITE | ||||
| operations. In that case, the machine or SSV | ||||
| credential will be allowed to send READ and/or WRITE. | ||||
| See <xref target="OP_EXCHANGE_ID" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| If the component provided to OPEN is a symbolic link, the error | ||||
| NFS4ERR_SYMLINK will be returned to the client, while if it is | ||||
| a directory the error NFS4ERR_ISDIR will be returned. | ||||
| If the component is neither | ||||
| of those but not an ordinary file, the error NFS4ERR_WRONG_TYPE | ||||
| is returned. If the current | ||||
| filehandle is not a directory, the error NFS4ERR_NOTDIR will be | ||||
| returned. | ||||
| </t> | ||||
| <t> | ||||
| The use of the OPEN4_RESULT_PRESERVE_UNLINKED result flag allows | ||||
| a client to avoid the common implementation practice of renaming | ||||
| an open file to ".nfs<unique value>" after it removes the file. | ||||
| After the server returns OPEN4_RESULT_PRESERVE_UNLINKED, if a client | ||||
| sends a REMOVE operation that would reduce the file's link count to | ||||
| zero, the server <bcp14>SHOULD</bcp14> report a value | ||||
| of zero for the numlinks attribute on the file. | ||||
| </t> | ||||
| <t> | ||||
| If another client has a delegation of the file being opened that | ||||
| conflicts with open being done (sometimes depending on the | ||||
| share_access or share_deny value specified), | ||||
| the delegation(s) <bcp14>MUST</bcp14> be recalled, and the | ||||
| operation cannot proceed until each such delegation is returned | ||||
| or revoked. Except where this | ||||
| happens very quickly, one or more NFS4ERR_DELAY errors will be | ||||
| returned to requests made while delegation remains outstanding. | ||||
| In the case of an OPEN_DELEGATE_WRITE delegation, any open by a different client | ||||
| will conflict, while for an OPEN_DELEGATE_READ delegation, only opens with one | ||||
| of the following characteristics will be considered conflicting: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The value of share_access includes the bit | ||||
| OPEN4_SHARE_ACCESS_WRITE. | ||||
| </li> | ||||
| <li> | ||||
| The value of share_deny specifies OPEN4_SHARE_DENY_READ or | ||||
| OPEN4_SHARE_DENY_BOTH. | ||||
| </li> | ||||
| <li> | ||||
| OPEN4_CREATE is specified together with UNCHECKED4, the | ||||
| size attribute is specified as zero (for truncation), and | ||||
| an existing file is truncated. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If OPEN4_CREATE is specified and the file does not exist and | ||||
| the current filehandle designates a directory for which another | ||||
| client holds a directory delegation, then, unless the delegation | ||||
| is such that the situation can be resolved by sending a notification, | ||||
| the delegation <bcp14>MUST</bcp14> be recalled, and the operation cannot proceed | ||||
| until the delegation is returned or revoked. Except where this | ||||
| happens very quickly, one or more NFS4ERR_DELAY errors will be | ||||
| returned to requests made while delegation remains outstanding. | ||||
| </t> | ||||
| <t> | ||||
| If OPEN4_CREATE is specified and the file does not exist and | ||||
| the current filehandle designates a directory for which | ||||
| one or more directory delegations exist, then, when those delegations | ||||
| request such notifications, NOTIFY4_ADD_ENTRY will be generated | ||||
| as a result of this operation. | ||||
| </t> | ||||
| <section toc="exclude" anchor="open_getfh_issue" numbered="true"> | ||||
| <name>Warning to Client Implementors</name> | ||||
| <t> | ||||
| OPEN resembles LOOKUP in that it generates a filehandle for the client | ||||
| to use. Unlike LOOKUP though, OPEN creates server state on the | ||||
| filehandle. In normal circumstances, the client can only release this | ||||
| state with a CLOSE operation. CLOSE uses the current filehandle to | ||||
| determine which file to close. Therefore, the client <bcp14>MUST</bcp14> follow every | ||||
| OPEN operation with a GETFH operation in the same COMPOUND procedure. | ||||
| This will supply the client with the filehandle such that CLOSE can be | ||||
| used appropriately. | ||||
| </t> | ||||
| <t> | ||||
| Simply waiting for the lease on the file to expire is insufficient | ||||
| because the server may maintain the state indefinitely as long as | ||||
| another client does not attempt to make a conflicting access to the | ||||
| same file. | ||||
| </t> | ||||
| <t> | ||||
| See also <xref target="COMPOUND_Sizing_Issues" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_OPENATTR" numbered="true" toc="default"> | ||||
| <name>Operation 19: OPENATTR - Open Named Attribute Directory</name> | ||||
| <section toc="exclude" anchor="OP_OPENATTR_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct OPENATTR4args { | ||||
| /* CURRENT_FH: object */ | ||||
| bool createdir; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_OPENATTR_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct OPENATTR4res { | ||||
| /* | ||||
| * If status is NFS4_OK, | ||||
| * new CURRENT_FH: named attribute | ||||
| * directory | ||||
| */ | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_OPENATTR_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The OPENATTR operation is used to obtain the filehandle of the named | ||||
| attribute directory associated with the current filehandle. The | ||||
| result of the OPENATTR will be a filehandle to an object of type | ||||
| NF4ATTRDIR. From this filehandle, READDIR and LOOKUP operations can | ||||
| be used to obtain filehandles for the various named attributes | ||||
| associated with the original file system object. Filehandles returned | ||||
| within the named attribute directory will designate objects of | ||||
| type of NF4NAMEDATTR. | ||||
| </t> | ||||
| <t> | ||||
| The createdir argument allows the client to signify if a named | ||||
| attribute directory should be created as a result of the OPENATTR | ||||
| operation. Some clients may use the OPENATTR operation with a value | ||||
| of FALSE for createdir to determine if any named attributes exist for | ||||
| the object. If none exist, then NFS4ERR_NOENT will be returned. If | ||||
| createdir has a value of TRUE and no named attribute directory exists, | ||||
| one is created and its filehandle becomes the current filehandle. | ||||
| On the other hand, if createdir has a value of TRUE and the named | ||||
| attribute directory already exists, no error results and the filehandle | ||||
| of the existing directory becomes the current filehandle. The | ||||
| creation of a named attribute directory assumes | ||||
| that the server has implemented named attribute support in this | ||||
| fashion and is not required to do so by this definition. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle designates an object of type | ||||
| NF4NAMEDATTR (a named attribute) or NF4ATTRDIR (a named attribute | ||||
| directory), an error of NFS4ERR_WRONG_TYPE is returned to the | ||||
| client. Named attributes or a named attribute directory <bcp14>MUST NOT</bcp14> | ||||
| have their own named attributes. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_OPENATTR_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| If the server does not support named attributes for the current | ||||
| filehandle, an error of NFS4ERR_NOTSUPP will be returned to the | ||||
| client. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_OPEN_DOWNGRADE" numbered="true" toc="default"> | ||||
| <name>Operation 21: OPEN_DOWNGRADE - Reduce Open File Access</name> | ||||
| <section toc="exclude" anchor="OP_OPEN_DOWNGRADE_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct OPEN_DOWNGRADE4args { | ||||
| /* CURRENT_FH: opened file */ | ||||
| stateid4 open_stateid; | ||||
| seqid4 seqid; | ||||
| uint32_t share_access; | ||||
| uint32_t share_deny; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_OPEN_DOWNGRADE_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct OPEN_DOWNGRADE4resok { | ||||
| stateid4 open_stateid; | ||||
| }; | ||||
| union OPEN_DOWNGRADE4res switch(nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| OPEN_DOWNGRADE4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_OPEN_DOWNGRADE_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation is used to adjust the access and deny states | ||||
| for a given open. This is necessary when a given open-owner opens the | ||||
| same file multiple times with different access and deny | ||||
| values. In this situation, a close of one of the opens may change the | ||||
| appropriate share_access and share_deny flags to remove bits | ||||
| associated with opens no longer in effect. | ||||
| </t> | ||||
| <t> | ||||
| Valid values for the expression (share_access & | ||||
| ~OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) are OPEN4_SHARE_ACCESS_READ, | ||||
| OPEN4_SHARE_ACCESS_WRITE, or OPEN4_SHARE_ACCESS_BOTH. If the client | ||||
| specifies other values, the server <bcp14>MUST</bcp14> reply with NFS4ERR_INVAL. | ||||
| </t> | ||||
| <t> | ||||
| Valid values for the share_deny field are | ||||
| OPEN4_SHARE_DENY_NONE, OPEN4_SHARE_DENY_READ, | ||||
| OPEN4_SHARE_DENY_WRITE, or OPEN4_SHARE_DENY_BOTH. If | ||||
| the client specifies other values, the server <bcp14>MUST</bcp14> | ||||
| reply with NFS4ERR_INVAL. | ||||
| </t> | ||||
| <t> | ||||
| After checking for valid values of share_access and | ||||
| share_deny, the server replaces the current access | ||||
| and deny modes on the file with share_access and | ||||
| share_deny subject to the following constraints: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The bits in share_access <bcp14>SHOULD</bcp14> equal the union of the share_access | ||||
| bits (not including OPEN4_SHARE_WANT_* bits) | ||||
| specified for some subset of the OPENs | ||||
| in effect for the current open-owner on the current | ||||
| file. | ||||
| </li> | ||||
| <li> | ||||
| The bits in share_deny <bcp14>SHOULD</bcp14> equal the union of the | ||||
| share_deny bits specified for some subset | ||||
| of the OPENs in effect for the current open-owner | ||||
| on the current file. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If the above constraints are not respected, | ||||
| the server <bcp14>SHOULD</bcp14> return the error NFS4ERR_INVAL. | ||||
| Since share_access and share_deny bits should be | ||||
| subsets of those already granted, short of a defect | ||||
| in the client or server implementation, it is not | ||||
| possible for the OPEN_DOWNGRADE request to be denied | ||||
| because of conflicting share reservations. | ||||
| </t> | ||||
| <t> | ||||
| The seqid argument is not used in NFSv4.1, <bcp14>MAY</bcp14> be any value, and | ||||
| <bcp14>MUST</bcp14> be ignored by the server. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_OPEN_DOWNGRADE_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| An OPEN_DOWNGRADE operation may make OPEN_DELEGATE_READ delegations grantable | ||||
| where they were not previously. Servers may choose to respond | ||||
| immediately if there are pending delegation want requests or may | ||||
| respond to the situation at a later time. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_PUTFH" numbered="true" toc="default"> | ||||
| <name>Operation 22: PUTFH - Set Current Filehandle</name> | ||||
| <section toc="exclude" anchor="OP_PUTFH_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct PUTFH4args { | ||||
| nfs_fh4 object; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_PUTFH_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct PUTFH4res { | ||||
| /* | ||||
| * If status is NFS4_OK, | ||||
| * new CURRENT_FH: argument to PUTFH | ||||
| */ | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_PUTFH_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation replaces the current filehandle with the filehandle provided as an | ||||
| argument. It clears the current stateid. | ||||
| </t> | ||||
| <t> | ||||
| If the security mechanism used by the requester does not meet the | ||||
| requirements of the filehandle provided to this operation, the server | ||||
| <bcp14>MUST</bcp14> return NFS4ERR_WRONGSEC. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="current_filehandle" format="default"/> for more details on the | ||||
| current filehandle. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="current_stateid" format="default"/> for more details on the current | ||||
| stateid. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_PUTFH_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| This operation is used | ||||
| in an NFS request to set the context for file accessing operations that | ||||
| follow in the same COMPOUND request. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_PUTPUBFH" numbered="true" toc="default"> | ||||
| <name>Operation 23: PUTPUBFH - Set Public Filehandle</name> | ||||
| <section toc="exclude" anchor="OP_PUTPUBFH_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| void; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_PUTPUBFH_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct PUTPUBFH4res { | ||||
| /* | ||||
| * If status is NFS4_OK, | ||||
| * new CURRENT_FH: public fh | ||||
| */ | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_PUTPUBFH_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation replaces the current filehandle with the filehandle that | ||||
| represents the public filehandle of the server's namespace. | ||||
| This filehandle may be different from the "root" filehandle | ||||
| that may be associated with some other directory on the server. | ||||
| </t> | ||||
| <t> | ||||
| PUTPUBFH also clears the current stateid. | ||||
| </t> | ||||
| <t> | ||||
| The public filehandle represents the concepts embodied in <xref target="RFC2054" format="default">RFC 2054</xref>, <xref target="RFC2055" format="default">RFC 2055</xref>, and <xref target="RFC2224" format="default">RFC 2224</xref>. The intent for NFSv4.1 | ||||
| is that the public filehandle (represented by the PUTPUBFH | ||||
| operation) be used as a method of providing WebNFS server | ||||
| compatibility with NFSv3. | ||||
| </t> | ||||
| <t> | ||||
| The public filehandle and the root filehandle (represented by the | ||||
| PUTROOTFH operation) <bcp14>SHOULD</bcp14> be equivalent. If the public and root | ||||
| filehandles are not equivalent, then the directory corresponding to the public filehandle <bcp14>MUST</bcp14> be a | ||||
| descendant of the directory corresponding to the root filehandle. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="current_filehandle" format="default"/> for more details on the | ||||
| current filehandle. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="current_stateid" format="default"/> for more details on the current | ||||
| stateid. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_PUTPUBFH_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| This operation is used | ||||
| in an NFS request to set the context for file accessing operations that | ||||
| follow in the same COMPOUND request. | ||||
| </t> | ||||
| <t> | ||||
| With the NFSv3 public filehandle, the client is | ||||
| able to specify whether the pathname provided in the LOOKUP | ||||
| should be evaluated as either an absolute path relative to the | ||||
| server's root or relative to the public filehandle. <xref target="RFC2224" format="default">RFC 2224</xref> contains further discussion of | ||||
| the functionality. With NFSv4.1, that type of | ||||
| specification is not directly available in the LOOKUP operation. | ||||
| The reason for this is because the component separators needed | ||||
| to specify absolute vs. relative are not allowed in NFSv4. Therefore, the client is responsible for constructing its | ||||
| request such that the use of either PUTROOTFH or PUTPUBFH | ||||
| signifies absolute or relative evaluation of an NFS URL, | ||||
| respectively. | ||||
| </t> | ||||
| <t> | ||||
| Note that there are warnings mentioned in <xref target="RFC2224" format="default">RFC 2224</xref> with respect to the use of | ||||
| absolute evaluation and the restrictions the server may place on | ||||
| that evaluation with respect to how much of its namespace has | ||||
| been made available. These same warnings apply to NFSv4.1. It is likely, therefore, that because of server | ||||
| implementation details, an NFSv3 absolute public | ||||
| filehandle look up may behave differently than an NFSv4.1 | ||||
| absolute resolution. | ||||
| </t> | ||||
| <t> | ||||
| There is a form of security negotiation as described | ||||
| in <xref target="RFC2755" format="default">RFC 2755</xref> that uses | ||||
| the public filehandle and an overloading of the pathname. | ||||
| This method is not available with NFSv4.1 as | ||||
| filehandles are not overloaded with special | ||||
| meaning and therefore do not provide the same | ||||
| framework as NFSv3. Clients should therefore use | ||||
| the security negotiation mechanisms described in | ||||
| <xref target="Security_Service_Negotiation" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_PUTROOTFH" numbered="true" toc="default"> | ||||
| <name>Operation 24: PUTROOTFH - Set Root Filehandle</name> | ||||
| <section toc="exclude" anchor="OP_PUTROOTFH_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| void;]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_PUTROOTFH_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct PUTROOTFH4res { | ||||
| /* | ||||
| * If status is NFS4_OK, | ||||
| * new CURRENT_FH: root fh | ||||
| */ | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_PUTROOTFH_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation replaces the current filehandle with the filehandle that represents | ||||
| the root of the server's namespace. From this filehandle, a LOOKUP | ||||
| operation can locate any other filehandle on the server. This | ||||
| filehandle may be different from the "public" filehandle that may be | ||||
| associated with some other directory on the server. | ||||
| </t> | ||||
| <t> | ||||
| PUTROOTFH also clears the current stateid. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="current_filehandle" format="default"/> for more details on the | ||||
| current filehandle. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="current_stateid" format="default"/> for more details on the current | ||||
| stateid. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_PUTROOTFH_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| This operation is used | ||||
| in an NFS request to set the context for file accessing operations that | ||||
| follow in the same COMPOUND request. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_READ" numbered="true" toc="default"> | ||||
| <name>Operation 25: READ - Read from File</name> | ||||
| <section toc="exclude" anchor="OP_READ_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct READ4args { | ||||
| /* CURRENT_FH: file */ | ||||
| stateid4 stateid; | ||||
| offset4 offset; | ||||
| count4 count; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_READ_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct READ4resok { | ||||
| bool eof; | ||||
| opaque data<>; | ||||
| }; | ||||
| union READ4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| READ4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_READ_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The READ operation reads data from the regular file identified by the | ||||
| current filehandle. | ||||
| </t> | ||||
| <t> | ||||
| The client provides an offset of where the READ is to start and a | ||||
| count of how many bytes are to be read. An offset of zero means | ||||
| to read data starting at the beginning of the file. If offset is | ||||
| greater than or equal to the size of the file, the status NFS4_OK is | ||||
| returned with a data length set to zero and eof is set to TRUE. | ||||
| The READ is subject to access permissions checking. | ||||
| </t> | ||||
| <t> | ||||
| If the client specifies a count value of zero, the READ succeeds | ||||
| and returns zero bytes of data again subject to access permissions | ||||
| checking. The server may choose to return fewer bytes than specified | ||||
| by the client. The client needs to check for this condition and | ||||
| handle the condition appropriately. | ||||
| </t> | ||||
| <t> | ||||
| Except when special stateids are used, the | ||||
| stateid value for a READ request represents a value returned from | ||||
| a previous byte-range lock or share reservation request or the stateid | ||||
| associated with a delegation. The stateid identifies the associated | ||||
| owners if any and is | ||||
| used by the server to verify that the associated locks are still | ||||
| valid (e.g., have not been revoked). | ||||
| </t> | ||||
| <t> | ||||
| If the read ended at the end-of-file (formally, in a correctly formed | ||||
| READ operation, if offset + count is equal to the size of the file), or | ||||
| the READ operation extends beyond the size of the file (if offset + | ||||
| count is greater than the size of the file), eof is returned as TRUE; | ||||
| otherwise, it is FALSE. A successful READ of an empty file will always | ||||
| return eof as TRUE. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle is not an ordinary file, an error will be | ||||
| returned to the client. In the case that the current filehandle | ||||
| represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. | ||||
| If the current filehandle designates a symbolic link, | ||||
| NFS4ERR_SYMLINK is returned. In all other cases, | ||||
| NFS4ERR_WRONG_TYPE is returned. | ||||
| </t> | ||||
| <t> | ||||
| For a READ with a stateid value of all bits equal to zero, the server <bcp14>MAY</bcp14> allow | ||||
| the READ to be serviced subject to mandatory byte-range locks or the current | ||||
| share deny modes for the file. For a READ with a stateid value of all | ||||
| bits equal to one, the server <bcp14>MAY</bcp14> allow READ operations to bypass locking checks | ||||
| at the server. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_READ_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| If the server returns a "short read" (i.e., fewer data than requested and eof is set to FALSE), the client should send another READ to get the | ||||
| remaining data. A server may return less data than requested under | ||||
| several circumstances. The file may have been truncated by another | ||||
| client or perhaps on the server itself, changing the file size from | ||||
| what the requesting client believes to be the case. This would reduce | ||||
| the actual amount of data available to the client. It is possible | ||||
| that the server reduce the transfer size and so return a short | ||||
| read result. Server resource exhaustion may also occur in a | ||||
| short read. | ||||
| </t> | ||||
| <t> | ||||
| If mandatory byte-range locking is in effect for the file, and if the byte-range | ||||
| corresponding to the data to be read from the file is WRITE_LT locked by an | ||||
| owner not associated with the stateid, the server will return the | ||||
| NFS4ERR_LOCKED error. The client should try to get the appropriate | ||||
| READ_LT via the LOCK operation before re-attempting the | ||||
| READ. When the READ completes, the client should release the byte-range | ||||
| lock via LOCKU. | ||||
| </t> | ||||
| <t> | ||||
| If another client has an OPEN_DELEGATE_WRITE delegation for the file being read, | ||||
| the delegation must be recalled, and the | ||||
| operation cannot proceed until that delegation is returned | ||||
| or revoked. Except where this | ||||
| happens very quickly, one or more NFS4ERR_DELAY errors will be | ||||
| returned to requests made while the delegation remains outstanding. | ||||
| Normally, delegations will not be recalled as a result of a READ | ||||
| operation since the recall will occur as a result of an earlier | ||||
| OPEN. However, since it is possible for a READ to be done with | ||||
| a special stateid, the server needs to check for this case even | ||||
| though the client should have done an OPEN previously. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_READDIR" numbered="true" toc="default"> | ||||
| <name>Operation 26: READDIR - Read Directory</name> | ||||
| <section toc="exclude" anchor="OP_READDIR_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct READDIR4args { | ||||
| /* CURRENT_FH: directory */ | ||||
| nfs_cookie4 cookie; | ||||
| verifier4 cookieverf; | ||||
| count4 dircount; | ||||
| count4 maxcount; | ||||
| bitmap4 attr_request; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_READDIR_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct entry4 { | ||||
| nfs_cookie4 cookie; | ||||
| component4 name; | ||||
| fattr4 attrs; | ||||
| entry4 *nextentry; | ||||
| }; | ||||
| struct dirlist4 { | ||||
| entry4 *entries; | ||||
| bool eof; | ||||
| }; | ||||
| struct READDIR4resok { | ||||
| verifier4 cookieverf; | ||||
| dirlist4 reply; | ||||
| }; | ||||
| union READDIR4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| READDIR4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_READDIR_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The READDIR operation retrieves a variable number of entries from a | ||||
| file system directory and returns client-requested attributes for each | ||||
| entry along with information to allow the client to request additional | ||||
| directory entries in a subsequent READDIR. | ||||
| </t> | ||||
| <t> | ||||
| The arguments contain a cookie value that represents where the READDIR | ||||
| should start within the directory. A value of zero for the cookie | ||||
| is used to start reading at the beginning of the directory. For | ||||
| subsequent READDIR requests, the client specifies a cookie value that | ||||
| is provided by the server on a previous READDIR request. | ||||
| </t> | ||||
| <t> | ||||
| The request's cookieverf field should be set to 0 | ||||
| zero) when the request's cookie field is zero | ||||
| (first read of the directory). On subsequent requests, the | ||||
| cookieverf field must match the cookieverf returned | ||||
| by the READDIR in which the cookie was acquired. | ||||
| If the server determines that the cookieverf | ||||
| is no longer valid for the directory, the error | ||||
| NFS4ERR_NOT_SAME must be returned. | ||||
| </t> | ||||
| <t> | ||||
| The dircount field of the request is a hint of the maximum number | ||||
| of bytes of directory information that should be returned. This value | ||||
| represents the total length of the names of the directory entries and the | ||||
| cookie value for these entries. This length represents the XDR | ||||
| encoding of the data (names and cookies) and not the length in the | ||||
| native format of the server. | ||||
| </t> | ||||
| <t> | ||||
| The maxcount field of the request represents the maximum | ||||
| total size of all of the data being returned within | ||||
| the READDIR4resok structure and includes the XDR | ||||
| overhead. The server <bcp14>MAY</bcp14> return less data. If the | ||||
| server is unable to return a single directory entry | ||||
| within the maxcount limit, the error NFS4ERR_TOOSMALL | ||||
| <bcp14>MUST</bcp14> be returned to the client. | ||||
| </t> | ||||
| <t> | ||||
| Finally, the request's attr_request field represents | ||||
| the list of attributes to be returned for each | ||||
| directory entry supplied by the server. | ||||
| </t> | ||||
| <t> | ||||
| A successful reply consists of a list of | ||||
| directory entries. Each of these entries contains the name of the | ||||
| directory entry, a cookie value for that entry, and the associated | ||||
| attributes as requested. The "eof" flag has a value of TRUE if there | ||||
| are no more entries in the directory. | ||||
| </t> | ||||
| <t> | ||||
| The cookie value is only meaningful to the server and is used | ||||
| as a cursor for the directory entry. As mentioned, this cookie | ||||
| is used by the client for subsequent READDIR operations so that it may | ||||
| continue reading a directory. The cookie is similar in concept to a | ||||
| READ offset but <bcp14>MUST NOT</bcp14> be interpreted as such by the client. | ||||
| Ideally, the cookie value <bcp14>SHOULD NOT</bcp14> change if the directory is | ||||
| modified since the client may be caching these values. | ||||
| </t> | ||||
| <t> | ||||
| In some cases, the server may encounter an error while obtaining the | ||||
| attributes for a directory entry. Instead of returning an error for | ||||
| the entire READDIR operation, the server can instead return the | ||||
| attribute rdattr_error (<xref target="attrdef_rdattr_error" format="default"/>). With this, the server is able to | ||||
| communicate the failure to the client and not fail the entire | ||||
| operation in the instance of what might be a transient failure. | ||||
| Obviously, the client must request the fattr4_rdattr_error attribute | ||||
| for this method to work properly. If the client does not request the | ||||
| attribute, the server has no choice but to return failure for the | ||||
| entire READDIR operation. | ||||
| </t> | ||||
| <t> | ||||
| For some file system environments, the directory entries "." and ".." | ||||
| have special meaning, and in other environments, they do not. If the | ||||
| server supports these special entries within a directory, they <bcp14>SHOULD | ||||
| NOT</bcp14> be returned to the client as part of the READDIR response. To | ||||
| enable some client environments, the cookie values of zero, 1, and 2 are | ||||
| to be considered reserved. Note that the UNIX client will use these | ||||
| values when combining the server's response and local representations | ||||
| to enable a fully formed UNIX directory presentation to the | ||||
| application. | ||||
| </t> | ||||
| <t> | ||||
| For READDIR arguments, cookie values of one and two <bcp14>SHOULD NOT</bcp14> be used, and | ||||
| for READDIR results, cookie values of zero, one, and two <bcp14>SHOULD NOT</bcp14> be | ||||
| returned. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_READDIR_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The server's file system directory representations | ||||
| can differ greatly. A client's programming | ||||
| interfaces may also be bound to the local operating | ||||
| environment in a way that does not translate well | ||||
| into the NFS protocol. Therefore, the use of the | ||||
| dircount and maxcount fields are provided to enable | ||||
| the client to provide hints to the server. If the | ||||
| client is aggressive about attribute collection | ||||
| during a READDIR, the server has an idea of how to | ||||
| limit the encoded response. | ||||
| </t> | ||||
| <t> | ||||
| If dircount is zero, the server bounds the reply's | ||||
| size based on the request's maxcount field. | ||||
| </t> | ||||
| <t> | ||||
| The cookieverf may be used by the server to help manage cookie values | ||||
| that may become stale. It should be a rare occurrence that a server is | ||||
| unable to continue properly reading a directory with the provided | ||||
| cookie/cookieverf pair. The server <bcp14>SHOULD</bcp14> make every effort to avoid | ||||
| this condition since the application at the client might be unable to | ||||
| properly handle this type of failure. | ||||
| </t> | ||||
| <t> | ||||
| The use of the cookieverf will also protect the client from using | ||||
| READDIR cookie values that might be stale. For example, if the file | ||||
| system has been migrated, the server might or might not be able to use the | ||||
| same cookie values to service READDIR as the previous server used. | ||||
| With the client providing the cookieverf, the server is able to | ||||
| provide the appropriate response to the client. This prevents the | ||||
| case where the server accepts a cookie value but the underlying | ||||
| directory has changed and the response is invalid from the client's | ||||
| context of its previous READDIR. | ||||
| </t> | ||||
| <t> | ||||
| Since some servers will not be returning "." and ".." entries as has | ||||
| been done with previous versions of the NFS protocol, the client that | ||||
| requires these entries be present in READDIR responses must fabricate | ||||
| them. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_READLINK" numbered="true" toc="default"> | ||||
| <name>Operation 27: READLINK - Read Symbolic Link</name> | ||||
| <section toc="exclude" anchor="OP_READLINK_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* CURRENT_FH: symlink */ | ||||
| void;]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_READLINK_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct READLINK4resok { | ||||
| linktext4 link; | ||||
| }; | ||||
| union READLINK4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| READLINK4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_READLINK_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| READLINK reads the data associated with a symbolic | ||||
| link. Depending on the value of the UTF-8 capability | ||||
| attribute (<xref target="utf8_caps" format="default"/>), the data is encoded | ||||
| in UTF-8. | ||||
| Whether created by an NFS client or created locally | ||||
| on the server, the data in a symbolic link is not | ||||
| interpreted (except possibly to check for proper UTF-8 | ||||
| encoding) when created, but is simply stored. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_READLINK_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| A symbolic link is nominally a pointer to another file. The data is | ||||
| not necessarily interpreted by the server, just stored in the file. | ||||
| It is possible for a client implementation to store a pathname that | ||||
| is not meaningful to the server operating system in a symbolic link. | ||||
| A READLINK operation returns the data to the client for | ||||
| interpretation. If different implementations want to share access to | ||||
| symbolic links, then they must agree on the interpretation of the data | ||||
| in the symbolic link. | ||||
| </t> | ||||
| <t> | ||||
| The READLINK operation is only allowed on objects of type NF4LNK. | ||||
| The server should return the error NFS4ERR_WRONG_TYPE if the | ||||
| object is not of type NF4LNK. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_REMOVE" numbered="true" toc="default"> | ||||
| <name>Operation 28: REMOVE - Remove File System Object</name> | ||||
| <section toc="exclude" anchor="OP_REMOVE_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct REMOVE4args { | ||||
| /* CURRENT_FH: directory */ | ||||
| component4 target; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_REMOVE_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct REMOVE4resok { | ||||
| change_info4 cinfo; | ||||
| }; | ||||
| union REMOVE4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| REMOVE4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_REMOVE_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The REMOVE operation removes (deletes) a directory entry named by | ||||
| filename from the directory corresponding to the current filehandle. | ||||
| If the entry in the directory was the last reference to the | ||||
| corresponding file system object, the object may be destroyed. | ||||
| The directory may be either of type NF4DIR or NF4ATTRDIR. | ||||
| </t> | ||||
| <t> | ||||
| For the directory where the filename was removed, the server | ||||
| returns change_info4 information in cinfo. With the atomic field of | ||||
| the change_info4 data type, the server will indicate if the before and | ||||
| after change attributes were obtained atomically with respect to the | ||||
| removal. | ||||
| </t> | ||||
| <t> | ||||
| If the target has a length of zero, or if | ||||
| the target does not obey the UTF-8 definition (and | ||||
| the server is enforcing UTF-8 encoding; see <xref target="utf8_caps" format="default"/>), the error NFS4ERR_INVAL will | ||||
| be returned. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_REMOVE_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| NFSv3 required a different operator RMDIR for directory | ||||
| removal and REMOVE for non-directory removal. This allowed clients to | ||||
| skip checking the file type when being passed a non-directory delete | ||||
| system call (e.g., <xref target="unlink" format="default">unlink()</xref> in POSIX) to remove a directory, as well as | ||||
| the converse (e.g., a rmdir() on a non-directory) because they knew the | ||||
| server would check the file type. NFSv4.1 REMOVE can be used to | ||||
| delete any directory entry independent of its file type. The | ||||
| implementor of an NFSv4.1 client's entry points from the | ||||
| unlink() and rmdir() system calls should first check the file type | ||||
| against the types the system call is allowed to remove before sending | ||||
| a REMOVE operation. Alternatively, the implementor can produce a COMPOUND call | ||||
| that includes a LOOKUP/VERIFY sequence of operations to verify the file type before | ||||
| a REMOVE operation in the same COMPOUND call. | ||||
| </t> | ||||
| <t> | ||||
| The concept of last reference is server | ||||
| specific. However, if the numlinks field in the | ||||
| previous attributes of the object had the value 1, | ||||
| the client should not rely on referring to the | ||||
| object via a filehandle. Likewise, the client | ||||
| should not rely on the resources (disk space, | ||||
| directory entry, and so on) formerly associated | ||||
| with the object becoming immediately available. | ||||
| Thus, if a client needs to be able to continue to | ||||
| access a file after using REMOVE to remove it, the | ||||
| client should take steps to make sure that the file | ||||
| will still be accessible. While the traditional | ||||
| mechanism used is to RENAME the file from its old | ||||
| name to a new hidden name, the NFSv4.1 OPEN operation | ||||
| <bcp14>MAY</bcp14> return a result flag, OPEN4_RESULT_PRESERVE_UNLINKED, | ||||
| which indicates to the client that the file will be | ||||
| preserved if the file has an outstanding open (see <xref target="OP_OPEN" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| If the server finds that the file is still open when the REMOVE | ||||
| arrives: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The server <bcp14>SHOULD NOT</bcp14> delete the file's directory entry if the | ||||
| file was opened with OPEN4_SHARE_DENY_WRITE or | ||||
| OPEN4_SHARE_DENY_BOTH. | ||||
| </li> | ||||
| <li> | ||||
| If the file was not opened with OPEN4_SHARE_DENY_WRITE or | ||||
| OPEN4_SHARE_DENY_BOTH, the server <bcp14>SHOULD</bcp14> delete the file's | ||||
| directory entry. However, until last CLOSE of the file, | ||||
| the server <bcp14>MAY</bcp14> continue to allow access to the file via | ||||
| its filehandle. | ||||
| </li> | ||||
| <li> | ||||
| The server <bcp14>MUST NOT</bcp14> delete the directory | ||||
| entry if the reply from OPEN had the flag | ||||
| OPEN4_RESULT_PRESERVE_UNLINKED set. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The server <bcp14>MAY</bcp14> implement its own restrictions on removal | ||||
| of a file while it is open. The server might disallow | ||||
| such a REMOVE (or a removal that occurs | ||||
| as part of RENAME). The conditions that influence the restrictions | ||||
| on removal of a file while it is still open include: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Whether certain access protocols (i.e., not just | ||||
| NFS) are holding the file open. | ||||
| </li> | ||||
| <li> | ||||
| Whether particular options, access modes, or policies on the | ||||
| server are enabled. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If a file has an outstanding OPEN and this prevents the | ||||
| removal of the file's directory entry, | ||||
| the error NFS4ERR_FILE_OPEN is returned. | ||||
| </t> | ||||
| <t> | ||||
| Where the determination above cannot be made | ||||
| definitively because delegations are being held, | ||||
| they <bcp14>MUST</bcp14> be recalled to allow processing of the | ||||
| REMOVE to continue. When a delegation is held, | ||||
| the server has no reliable knowledge of the status of OPENs for | ||||
| that client, so unless | ||||
| there are files opened with the particular deny modes | ||||
| by clients without delegations, the determination | ||||
| cannot be made until delegations are recalled, and | ||||
| the operation cannot proceed until each sufficient | ||||
| delegation has been returned or revoked to allow | ||||
| the server to make a correct determination. | ||||
| </t> | ||||
| <t> | ||||
| In all cases in which delegations are recalled, the server | ||||
| is likely to return one or more NFS4ERR_DELAY errors while | ||||
| delegations remain outstanding. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle designates a directory for | ||||
| which another client holds a directory delegation, | ||||
| then, unless the situation can be resolved by sending | ||||
| a notification, the directory delegation <bcp14>MUST</bcp14> be | ||||
| recalled, and the operation <bcp14>MUST NOT</bcp14> proceed until | ||||
| the delegation is returned or revoked. Except where | ||||
| this happens very quickly, one or more NFS4ERR_DELAY | ||||
| errors will be returned to requests made while | ||||
| delegation remains outstanding. | ||||
| </t> | ||||
| <t> | ||||
| When the current filehandle designates a directory | ||||
| for which one or more directory delegations | ||||
| exist, then, when those delegations request | ||||
| such notifications, NOTIFY4_REMOVE_ENTRY will be | ||||
| generated as a result of this operation. | ||||
| </t> | ||||
| <t> | ||||
| Note that when a remove occurs as a result of a | ||||
| RENAME, NOTIFY4_REMOVE_ENTRY will only be generated | ||||
| if the removal happens as a separate operation. | ||||
| In the case in which the removal is integrated and | ||||
| atomic with RENAME, the notification of the removal | ||||
| is integrated with notification for the RENAME. See | ||||
| the discussion of the NOTIFY4_RENAME_ENTRY | ||||
| notification in <xref target="OP_CB_NOTIFY" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_RENAME" numbered="true" toc="default"> | ||||
| <name>Operation 29: RENAME - Rename Directory Entry</name> | ||||
| <section toc="exclude" anchor="OP_RENAME_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct RENAME4args { | ||||
| /* SAVED_FH: source directory */ | ||||
| component4 oldname; | ||||
| /* CURRENT_FH: target directory */ | ||||
| component4 newname; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_RENAME_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct RENAME4resok { | ||||
| change_info4 source_cinfo; | ||||
| change_info4 target_cinfo; | ||||
| }; | ||||
| union RENAME4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| RENAME4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_RENAME_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The RENAME operation renames the object identified by oldname in the | ||||
| source directory corresponding to the saved filehandle, as set by the | ||||
| SAVEFH operation, to newname in the target directory corresponding to | ||||
| the current filehandle. The operation is required to be atomic to the | ||||
| client. Source and target directories <bcp14>MUST</bcp14> reside on the same | ||||
| file system on the server. On success, the current filehandle will | ||||
| continue to be the target directory. | ||||
| </t> | ||||
| <t> | ||||
| If the target directory already contains an entry with the name | ||||
| newname, the source object <bcp14>MUST</bcp14> be compatible with the target: either | ||||
| both are non-directories or both are directories and the target <bcp14>MUST</bcp14> | ||||
| be empty. | ||||
| If compatible, the existing target is removed before the | ||||
| rename occurs or, preferably, the target is removed atomically as | ||||
| part of the rename. | ||||
| See <xref target="OP_REMOVE_IMPLEMENTATION" format="default"/> | ||||
| for client and server actions whenever a target is removed. | ||||
| Note however that when the removal is performed atomically with the | ||||
| rename, certain parts of the removal described there are integrated | ||||
| with the rename. For example, notification of the removal will not | ||||
| be via a NOTIFY4_REMOVE_ENTRY but will be indicated as part of the | ||||
| NOTIFY4_ADD_ENTRY or NOTIFY4_RENAME_ENTRY generated by the rename. | ||||
| </t> | ||||
| <t> | ||||
| If the source object and the target are not | ||||
| compatible or if the target is a directory but not empty, the server | ||||
| will return the error NFS4ERR_EXIST. | ||||
| </t> | ||||
| <t> | ||||
| If oldname and newname both refer to the same | ||||
| file (e.g., they might be hard links of each | ||||
| other), then unless the file is open (see <xref target="OP_RENAME_IMPLEMENTATION" format="default"/>), RENAME <bcp14>MUST</bcp14> | ||||
| perform no action and return NFS4_OK. | ||||
| </t> | ||||
| <t> | ||||
| For both directories involved in the RENAME, the server returns | ||||
| change_info4 information. With the atomic field of the change_info4 | ||||
| data type, the server will indicate if the before and after change | ||||
| attributes were obtained atomically with respect to the rename. | ||||
| </t> | ||||
| <t> | ||||
| If oldname refers to a named attribute and the saved and current | ||||
| filehandles refer to different file system objects, the server will | ||||
| return NFS4ERR_XDEV just as if the saved and current filehandles | ||||
| represented directories on different file systems. | ||||
| </t> | ||||
| <t> | ||||
| If oldname or newname has a length of zero, or if oldname or | ||||
| newname does not obey the UTF-8 definition, the error NFS4ERR_INVAL | ||||
| will be returned. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_RENAME_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The server <bcp14>MAY</bcp14> impose restrictions on the RENAME | ||||
| operation such that RENAME may not be done when the | ||||
| file being renamed is open or when that open is done | ||||
| by particular protocols, or with particular options | ||||
| or access modes. Similar restrictions may be applied | ||||
| when a file exists with the target name and is open. | ||||
| When RENAME is rejected because of such restrictions, | ||||
| the error NFS4ERR_FILE_OPEN is returned. | ||||
| </t> | ||||
| <t> | ||||
| When oldname and rename refer to the same file and | ||||
| that file is open in a fashion such that RENAME | ||||
| would normally be rejected with NFS4ERR_FILE_OPEN | ||||
| if oldname and newname were different files, then | ||||
| RENAME <bcp14>SHOULD</bcp14> be rejected with NFS4ERR_FILE_OPEN. | ||||
| </t> | ||||
| <t> | ||||
| If a server does implement such restrictions and those restrictions | ||||
| include cases of NFSv4 opens preventing successful execution of | ||||
| a rename, the server needs to recall any delegations that could | ||||
| hide the existence of opens relevant to that decision. This is | ||||
| because when a client holds a delegation, the server | ||||
| might not have an accurate account of the opens for that client, since | ||||
| the client may execute OPENs and CLOSEs locally. The RENAME operation | ||||
| need only be delayed until a definitive result can be obtained. For | ||||
| example, if there are multiple delegations and one of them establishes | ||||
| an open whose presence would prevent the rename, given the server's | ||||
| semantics, NFS4ERR_FILE_OPEN may be returned to the caller as soon | ||||
| as that delegation is returned without waiting for other delegations | ||||
| to be returned. Similarly, if such opens are not associated with | ||||
| delegations, NFS4ERR_FILE_OPEN can be returned immediately with no | ||||
| delegation recall being done. | ||||
| </t> | ||||
| <t> | ||||
| If the current filehandle or the saved filehandle designates a | ||||
| directory for which another client holds a directory delegation, | ||||
| then, unless the situation can be resolved by sending a notification, | ||||
| the delegation <bcp14>MUST</bcp14> be recalled, and the operation cannot proceed | ||||
| until the delegation is returned or revoked. Except where this | ||||
| happens very quickly, one or more NFS4ERR_DELAY errors will be | ||||
| returned to requests made while delegation remains outstanding. | ||||
| </t> | ||||
| <t> | ||||
| When the current and saved filehandles are the | ||||
| same and they designate a directory for which one | ||||
| or more directory delegations exist, then, when | ||||
| those delegations request such notifications, | ||||
| a notification of type NOTIFY4_RENAME_ENTRY | ||||
| will be generated as a result of this operation. | ||||
| When oldname and rename refer to the same file, | ||||
| no notification is generated (because, as <xref target="OP_RENAME_DESCRIPTION" format="default"/> states, the server | ||||
| <bcp14>MUST</bcp14> take no action). When a file is removed | ||||
| because it has the same name as the target, if | ||||
| that removal is done atomically with the rename, | ||||
| a NOTIFY4_REMOVE_ENTRY notification will not be | ||||
| generated. Instead, the deletion of the file will | ||||
| be reported as part of the NOTIFY4_RENAME_ENTRY | ||||
| notification. | ||||
| </t> | ||||
| <t> | ||||
| When the current and saved filehandles are not the same: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the current filehandle designates a directory for which | ||||
| one or more directory delegations exist, then, when those | ||||
| delegations request such notifications, NOTIFY4_ADD_ENTRY | ||||
| will be generated as a result of this operation. When a file | ||||
| is removed because it has the same name as the target, if that | ||||
| removal is done atomically with the rename, a | ||||
| NOTIFY4_REMOVE_ENTRY notification will not be generated. | ||||
| Instead, the deletion of the file will be reported as part | ||||
| of the NOTIFY4_ADD_ENTRY notification. | ||||
| </li> | ||||
| <li> | ||||
| If the saved filehandle designates a directory for which | ||||
| one or more directory delegations exist, then, when those | ||||
| delegations request such notifications, NOTIFY4_REMOVE_ENTRY | ||||
| will be generated as a result of this operation. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If the object being renamed has file delegations | ||||
| held by clients other than the one doing the RENAME, | ||||
| the delegations <bcp14>MUST</bcp14> be recalled, and the | ||||
| operation cannot proceed | ||||
| until each such delegation is returned | ||||
| or revoked. Note that in the case of multiply linked files, | ||||
| the delegation recall requirement applies even if the | ||||
| delegation was obtained through a different name than the | ||||
| one being renamed. | ||||
| In all cases in which delegations are recalled, the server | ||||
| is likely to return one or more NFS4ERR_DELAY errors while the | ||||
| delegation(s) remains outstanding, although it might not do that if the | ||||
| delegations are returned quickly. | ||||
| </t> | ||||
| <t> | ||||
| The RENAME operation must be atomic to the client. The statement | ||||
| "source and target directories <bcp14>MUST</bcp14> reside on the same file system | ||||
| on the server" | ||||
| means that the fsid fields in the attributes for the | ||||
| directories are the same. If they reside on different file systems, | ||||
| the error NFS4ERR_XDEV is returned. | ||||
| </t> | ||||
| <t> | ||||
| Based on the value of the fh_expire_type attribute for the object, the | ||||
| filehandle may or may not expire on a RENAME. However, server | ||||
| implementors are strongly encouraged to attempt to keep filehandles | ||||
| from expiring in this fashion. | ||||
| </t> | ||||
| <t> | ||||
| On some servers, the file names "." and ".." are illegal as either | ||||
| oldname or newname, and will result in the error NFS4ERR_BADNAME. | ||||
| In addition, on many servers the case of oldname or newname being | ||||
| an alias for the source directory will be checked for. Such servers | ||||
| will return the error NFS4ERR_INVAL in these cases. | ||||
| </t> | ||||
| <t> | ||||
| If either of the source or target filehandles are not directories, the | ||||
| server will return NFS4ERR_NOTDIR. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_RESTOREFH" numbered="true" toc="default"> | ||||
| <name>Operation 31: RESTOREFH - Restore Saved Filehandle</name> | ||||
| <section toc="exclude" anchor="OP_RESTOREFH_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* SAVED_FH: */ | ||||
| void;]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_RESTOREFH_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct RESTOREFH4res { | ||||
| /* | ||||
| * If status is NFS4_OK, | ||||
| * new CURRENT_FH: value of saved fh | ||||
| */ | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_RESTOREFH_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The RESTOREFH operation sets the current filehandle and stateid to the values in the | ||||
| saved filehandle and stateid. If | ||||
| there is no saved filehandle, then the server will | ||||
| return the error NFS4ERR_NOFILEHANDLE. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="current_filehandle" format="default"/> for more details on the | ||||
| current filehandle. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="current_stateid" format="default"/> for more details on the current | ||||
| stateid. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_RESTOREFH_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| Operations like OPEN and LOOKUP use the current filehandle | ||||
| to represent a directory and replace it with a new filehandle. | ||||
| Assuming that the previous filehandle was saved with a SAVEFH operator, | ||||
| the previous filehandle can be restored as the current filehandle. | ||||
| This is commonly used to obtain post-operation attributes for | ||||
| the directory, e.g., | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| PUTFH (directory filehandle) | ||||
| SAVEFH | ||||
| GETATTR attrbits (pre-op dir attrs) | ||||
| CREATE optbits "foo" attrs | ||||
| GETATTR attrbits (file attributes) | ||||
| RESTOREFH | ||||
| GETATTR attrbits (post-op dir attrs)]]></sourcecode> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_SAVEFH" numbered="true" toc="default"> | ||||
| <name>Operation 32: SAVEFH - Save Current Filehandle</name> | ||||
| <section toc="exclude" anchor="OP_SAVEFH_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* CURRENT_FH: */ | ||||
| void;]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SAVEFH_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct SAVEFH4res { | ||||
| /* | ||||
| * If status is NFS4_OK, | ||||
| * new SAVED_FH: value of current fh | ||||
| */ | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SAVEFH_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The SAVEFH operation saves the current filehandle and stateid. | ||||
| If a previous filehandle was saved, then | ||||
| it is no longer accessible. The saved filehandle can be restored as | ||||
| the current filehandle with the RESTOREFH operator. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="current_filehandle" format="default"/> for more details on the | ||||
| current filehandle. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="current_stateid" format="default"/> for more details on the current | ||||
| stateid. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SAVEFH_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_SECINFO" numbered="true" toc="default"> | ||||
| <name>Operation 33: SECINFO - Obtain Available Security</name> | ||||
| <section toc="exclude" anchor="OP_SECINFO_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct SECINFO4args { | ||||
| /* CURRENT_FH: directory */ | ||||
| component4 name; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SECINFO_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* | ||||
| * From RFC 2203 | ||||
| */ | ||||
| enum rpc_gss_svc_t { | ||||
| RPC_GSS_SVC_NONE = 1, | ||||
| RPC_GSS_SVC_INTEGRITY = 2, | ||||
| RPC_GSS_SVC_PRIVACY = 3 | ||||
| }; | ||||
| struct rpcsec_gss_info { | ||||
| sec_oid4 oid; | ||||
| qop4 qop; | ||||
| rpc_gss_svc_t service; | ||||
| }; | ||||
| /* RPCSEC_GSS has a value of '6' - See RFC 2203 */ | ||||
| union secinfo4 switch (uint32_t flavor) { | ||||
| case RPCSEC_GSS: | ||||
| rpcsec_gss_info flavor_info; | ||||
| default: | ||||
| void; | ||||
| }; | ||||
| typedef secinfo4 SECINFO4resok<>; | ||||
| union SECINFO4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| /* CURRENTFH: consumed */ | ||||
| SECINFO4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SECINFO_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The SECINFO operation is used by the client to obtain a list of | ||||
| valid RPC authentication flavors for a specific directory | ||||
| filehandle, file name pair. SECINFO should apply the same | ||||
| access methodology used for LOOKUP when evaluating the name. | ||||
| Therefore, if the requester does not have the appropriate access | ||||
| to LOOKUP the name, then SECINFO <bcp14>MUST</bcp14> behave the same way and | ||||
| return NFS4ERR_ACCESS. | ||||
| </t> | ||||
| <t> | ||||
| The result will contain an array that represents the security | ||||
| mechanisms available, with an order corresponding to the | ||||
| server's preferences, the most preferred being first in the | ||||
| array. The client is free to pick whatever security mechanism it | ||||
| both desires and supports, or to pick in the server's preference | ||||
| order the first one it supports. The array entries are | ||||
| represented by the secinfo4 structure. The field 'flavor' will | ||||
| contain a value of AUTH_NONE, AUTH_SYS (as defined in <xref target="RFC5531" format="default">RFC 5531</xref>), or RPCSEC_GSS (as defined in | ||||
| <xref target="RFC2203" format="default">RFC 2203</xref>). The field flavor can | ||||
| also be any other security flavor registered with IANA. | ||||
| </t> | ||||
| <t> | ||||
| For the flavors AUTH_NONE and AUTH_SYS, no additional security | ||||
| information is returned. The same is true of many (if not most) | ||||
| other security flavors, including AUTH_DH. For a return value of | ||||
| RPCSEC_GSS, a security triple is returned that contains the | ||||
| mechanism object identifier (OID, as defined in <xref target="RFC2743" format="default">RFC 2743</xref>), the quality of protection (as | ||||
| defined in <xref target="RFC2743" format="default">RFC 2743</xref>), and the | ||||
| service type (as defined in <xref target="RFC2203" format="default">RFC 2203</xref>). It is possible for SECINFO to | ||||
| return multiple entries with flavor equal to RPCSEC_GSS with | ||||
| different security triple values. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle is consumed (see | ||||
| <xref target="aftersecinfo" format="default"/>), and if the | ||||
| next operation after SECINFO tries to use the current filehandle, | ||||
| that operation will fail with the status NFS4ERR_NOFILEHANDLE. | ||||
| </t> | ||||
| <t> | ||||
| If the name has a length of zero, or if the name does not obey | ||||
| the UTF-8 definition (assuming UTF-8 capabilities are enabled; see | ||||
| <xref target="utf8_caps" format="default"/>), the error NFS4ERR_INVAL will be returned. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="Security_Service_Negotiation" format="default"/> | ||||
| for additional information on the use of SECINFO. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SECINFO_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The SECINFO operation is expected to be used by the NFS client | ||||
| when the error value of NFS4ERR_WRONGSEC is returned from | ||||
| another NFS operation. This signifies to the client that the | ||||
| server's security policy is different from what the client is | ||||
| currently using. At this point, the client is expected to | ||||
| obtain a list of possible security flavors and choose what best | ||||
| suits its policies. | ||||
| </t> | ||||
| <t> | ||||
| As mentioned, the server's security | ||||
| policies will determine when a client | ||||
| request receives NFS4ERR_WRONGSEC. See <xref target="error_op_returns" format="default"/> for a list of operations | ||||
| that can return NFS4ERR_WRONGSEC. In addition, | ||||
| when READDIR returns attributes, the rdattr_error | ||||
| (<xref target="attrdef_rdattr_error" format="default"/>) | ||||
| can contain NFS4ERR_WRONGSEC. Note that CREATE and | ||||
| REMOVE <bcp14>MUST NOT</bcp14> return NFS4ERR_WRONGSEC. The | ||||
| rationale for CREATE is that unless the | ||||
| target name exists, it cannot have a separate | ||||
| security policy from the parent directory, | ||||
| and the security policy of the parent was | ||||
| checked when its filehandle was injected into | ||||
| the COMPOUND request's operations stream (for | ||||
| similar reasons, an OPEN operation that creates | ||||
| the target <bcp14>MUST NOT</bcp14> return NFS4ERR_WRONGSEC). If | ||||
| the target name exists, while it might have a | ||||
| separate security policy, that is irrelevant | ||||
| because CREATE <bcp14>MUST</bcp14> return NFS4ERR_EXIST. | ||||
| The rationale for REMOVE is that while that | ||||
| target might have a separate security policy, the | ||||
| target is going to be removed, and so the | ||||
| security policy of the parent trumps that of the | ||||
| object being removed. RENAME and LINK <bcp14>MAY</bcp14> return | ||||
| NFS4ERR_WRONGSEC, but the NFS4ERR_WRONGSEC error | ||||
| applies only to the saved filehandle (see <xref target="link_rename" format="default"/>). Any NFS4ERR_WRONGSEC | ||||
| error on the current filehandle used by LINK and | ||||
| RENAME <bcp14>MUST</bcp14> be returned by the PUTFH, PUTPUBFH, | ||||
| PUTROOTFH, or RESTOREFH operation that injected | ||||
| the current filehandle. | ||||
| </t> | ||||
| <t> | ||||
| With the exception of LINK and RENAME, | ||||
| the set of operations that can return NFS4ERR_WRONGSEC | ||||
| represents the point at which the client can inject a | ||||
| filehandle into the "current filehandle" at the server. The | ||||
| filehandle is either provided by the client (PUTFH, PUTPUBFH, | ||||
| PUTROOTFH), generated as a result of a name-to-filehandle | ||||
| translation (LOOKUP and OPEN), or generated from the saved filehandle | ||||
| via RESTOREFH. As <xref target="PUTFHplusSAVEFH" format="default"/> states, | ||||
| a put filehandle operation followed by SAVEFH <bcp14>MUST NOT</bcp14> | ||||
| return NFS4ERR_WRONGSEC. Thus, the RESTOREFH operation, under | ||||
| certain conditions (see <xref target="putfh_series" format="default"/>), is | ||||
| permitted to return NFS4ERR_WRONGSEC so that security policies | ||||
| can be honored. | ||||
| </t> | ||||
| <t> | ||||
| The READDIR operation will not directly return the | ||||
| NFS4ERR_WRONGSEC error. However, if the READDIR request | ||||
| included a request for attributes, it is possible that the | ||||
| READDIR request's security triple did not match that of a | ||||
| directory entry. If this is the case and the client has | ||||
| requested the rdattr_error attribute, the server will return the | ||||
| NFS4ERR_WRONGSEC error in rdattr_error for the entry. | ||||
| </t> | ||||
| <t> | ||||
| To resolve an error return of | ||||
| NFS4ERR_WRONGSEC, the client does the following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| For LOOKUP and OPEN, the client will use SECINFO with the | ||||
| same current filehandle and name as provided in the | ||||
| original LOOKUP or OPEN to enumerate the available security | ||||
| triples. | ||||
| </li> | ||||
| <li> | ||||
| For the rdattr_error, the client will use | ||||
| SECINFO with the same current filehandle | ||||
| as provided in the original READDIR. The | ||||
| name passed to SECINFO will be that of the | ||||
| directory entry (as returned from READDIR) | ||||
| that had the NFS4ERR_WRONGSEC error in the | ||||
| rdattr_error attribute. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| For PUTFH, PUTROOTFH, PUTPUBFH, | ||||
| RESTOREFH, LINK, and RENAME, the client will | ||||
| use SECINFO_NO_NAME { style = | ||||
| SECINFO_STYLE4_CURRENT_FH }. The client | ||||
| will prefix the SECINFO_NO_NAME operation | ||||
| with the appropriate PUTFH, PUTPUBFH, | ||||
| or PUTROOTFH operation that provides the | ||||
| filehandle originally provided by the PUTFH, | ||||
| PUTPUBFH, PUTROOTFH, or RESTOREFH operation. | ||||
| </t> | ||||
| <t> | ||||
| NOTE: In NFSv4.0, the client was required | ||||
| to use SECINFO, and had to reconstruct the | ||||
| parent of the original filehandle and the | ||||
| component name of the original filehandle. The | ||||
| introduction in NFSv4.1 of SECINFO_NO_NAME | ||||
| obviates the need for reconstruction. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| For LOOKUPP, the client will | ||||
| use SECINFO_NO_NAME { style = | ||||
| SECINFO_STYLE4_PARENT } and provide the | ||||
| filehandle that equals the filehandle | ||||
| originally provided to LOOKUPP. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| See <xref target="SECCON" format="default"/> for a discussion on | ||||
| the recommendations for the security flavor used by SECINFO and | ||||
| SECINFO_NO_NAME. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_SETATTR" numbered="true" toc="default"> | ||||
| <name>Operation 34: SETATTR - Set Attributes</name> | ||||
| <section toc="exclude" anchor="OP_SETATTR_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct SETATTR4args { | ||||
| /* CURRENT_FH: target object */ | ||||
| stateid4 stateid; | ||||
| fattr4 obj_attributes; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SETATTR_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct SETATTR4res { | ||||
| nfsstat4 status; | ||||
| bitmap4 attrsset; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SETATTR_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The SETATTR operation changes one or more of the attributes of a | ||||
| file system object. The new attributes are specified with a bitmap and | ||||
| the attributes that follow the bitmap in bit order. | ||||
| </t> | ||||
| <t> | ||||
| The stateid argument for SETATTR is used to provide byte-range locking | ||||
| context that is necessary for SETATTR requests that set the size | ||||
| attribute. Since setting the size attribute modifies the file's data, | ||||
| it has the same locking requirements as a corresponding WRITE. Any | ||||
| SETATTR that sets the size attribute is incompatible with a share | ||||
| reservation that specifies OPEN4_SHARE_DENY_WRITE. The area between the old | ||||
| end-of-file and the new end-of-file is considered to be modified just | ||||
| as would have been the case had the area in question been specified as | ||||
| the target of WRITE, for the purpose of checking conflicts with byte-range | ||||
| locks, for those cases in which a server is implementing mandatory | ||||
| byte-range locking behavior. A valid stateid <bcp14>SHOULD</bcp14> always be specified. | ||||
| When the file size attribute is not set, the special stateid | ||||
| consisting of all bits equal to zero <bcp14>MAY</bcp14> be passed. | ||||
| </t> | ||||
| <t> | ||||
| On either success or failure of the operation, the server will return | ||||
| the attrsset bitmask to represent what (if any) attributes were | ||||
| successfully set. The attrsset in the response is a subset of the | ||||
| attrmask field of the obj_attributes field in the argument. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SETATTR_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| If the request specifies the owner attribute to be set, the server | ||||
| <bcp14>SHOULD</bcp14> allow the operation to succeed if the current owner of the | ||||
| object matches the value specified in the request. Some servers may | ||||
| be implemented in a way as to prohibit the setting of the owner | ||||
| attribute unless the requester has privilege to do so. If the server | ||||
| is lenient in this one case of matching owner values, the client | ||||
| implementation may be simplified in cases of creation of an object | ||||
| (e.g., an exclusive create via OPEN) | ||||
| followed by a SETATTR. | ||||
| </t> | ||||
| <t> | ||||
| The file size attribute is used to request changes | ||||
| to the size of a file. A value of zero causes the | ||||
| file to be truncated, a value less than the current | ||||
| size of the file causes data from new size to the | ||||
| end of the file to be discarded, and a size greater | ||||
| than the current size of the file causes logically | ||||
| zeroed data bytes to be added to the end of the | ||||
| file. Servers are free to implement this using | ||||
| unallocated bytes (holes) or allocated data bytes | ||||
| set to zero. Clients should not make any assumptions | ||||
| regarding a server's implementation of this feature, | ||||
| beyond that the bytes in the affected byte-range returned by | ||||
| READ will be zeroed. Servers <bcp14>MUST</bcp14> support extending | ||||
| the file size via SETATTR. | ||||
| </t> | ||||
| <t> | ||||
| SETATTR is not guaranteed to be atomic. A failed SETATTR may partially | ||||
| change a file's attributes, hence the reason why the reply always | ||||
| includes the status and the list of attributes that were set. | ||||
| </t> | ||||
| <t> | ||||
| If the object whose attributes are being changed has a file delegation | ||||
| that is held by a client other than the one doing the SETATTR, | ||||
| the delegation(s) must be recalled, and the | ||||
| operation cannot proceed to actually change an attribute | ||||
| until each such delegation is returned | ||||
| or revoked. | ||||
| In all cases in which delegations are recalled, the server | ||||
| is likely to return one or more NFS4ERR_DELAY errors while the | ||||
| delegation(s) remains outstanding, although it might not do that if the | ||||
| delegations are returned quickly. | ||||
| </t> | ||||
| <t> | ||||
| If the object whose attributes are being set is a directory | ||||
| and another client holds a directory delegation for that | ||||
| directory, then if enabled, asynchronous notifications will be generated | ||||
| when the set of attributes changed has a non-null intersection | ||||
| with the set of attributes for which notification is requested. | ||||
| Notifications of type NOTIFY4_CHANGE_DIR_ATTRS will be sent to | ||||
| the appropriate client(s), but the SETATTR is not delayed by | ||||
| waiting for these notifications to be sent. | ||||
| </t> | ||||
| <t> | ||||
| If the object whose attributes are being set is a member of | ||||
| the directory for which another client holds a directory delegation, | ||||
| then asynchronous notifications will be generated | ||||
| when the set of attributes changed has a non-null intersection | ||||
| with the set of attributes for which notification is requested. | ||||
| Notifications of type NOTIFY4_CHANGE_CHILD_ATTRS will be sent to | ||||
| the appropriate clients, but the SETATTR is not delayed by | ||||
| waiting for these notifications to be sent. | ||||
| </t> | ||||
| <t> | ||||
| Changing the size of a file with SETATTR indirectly | ||||
| changes the time_modify and change attributes. | ||||
| A client must account for this as size changes can | ||||
| result in data deletion. | ||||
| </t> | ||||
| <t> | ||||
| The attributes time_access_set and time_modify_set are write-only | ||||
| attributes constructed as a switched union so the client can direct | ||||
| the server in setting the time values. If the switched union | ||||
| specifies SET_TO_CLIENT_TIME4, the client has provided an nfstime4 to | ||||
| be used for the operation. If the switch union does not specify | ||||
| SET_TO_CLIENT_TIME4, the server is to use its current time for the | ||||
| SETATTR operation. | ||||
| </t> | ||||
| <t> | ||||
| If server and client times differ, programs that compare client time | ||||
| to file times can break. A time synchronization protocol should be used to | ||||
| limit client/server time skew. | ||||
| </t> | ||||
| <t> | ||||
| Use of a COMPOUND containing a VERIFY operation specifying only the | ||||
| change attribute, immediately followed by a SETATTR, provides a means | ||||
| whereby a client may specify a request that emulates the functionality | ||||
| of the SETATTR guard mechanism of NFSv3. Since the function | ||||
| of the guard mechanism is to avoid changes to the file attributes | ||||
| based on stale information, delays between checking of the guard | ||||
| condition and the setting of the attributes have the potential to | ||||
| compromise this function, as would the corresponding delay in the | ||||
| NFSv4 emulation. Therefore, NFSv4.1 servers <bcp14>SHOULD</bcp14> take | ||||
| care to avoid such delays, to the degree possible, when executing such | ||||
| a request. | ||||
| </t> | ||||
| <t> | ||||
| If the server does not support an attribute as requested by the | ||||
| client, the server <bcp14>SHOULD</bcp14> return NFS4ERR_ATTRNOTSUPP. | ||||
| </t> | ||||
| <t> | ||||
| A mask of the attributes actually set is returned by SETATTR in all | ||||
| cases. That mask <bcp14>MUST NOT</bcp14> include attribute bits not requested to be | ||||
| set by the client. | ||||
| If the attribute masks in the request and | ||||
| reply are equal, the status field in the reply <bcp14>MUST</bcp14> be NFS4_OK. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_VERIFY" numbered="true" toc="default"> | ||||
| <name>Operation 37: VERIFY - Verify Same Attributes</name> | ||||
| <section toc="exclude" anchor="OP_VERIFY_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct VERIFY4args { | ||||
| /* CURRENT_FH: object */ | ||||
| fattr4 obj_attributes; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_VERIFY_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct VERIFY4res { | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_VERIFY_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The VERIFY operation is used to verify that attributes have the value | ||||
| assumed by the client before proceeding with the following operations in | ||||
| the COMPOUND request. If any of the attributes do not match, then the | ||||
| error NFS4ERR_NOT_SAME must be returned. The current filehandle | ||||
| retains its value after successful completion of the operation. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_VERIFY_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| One possible use of the VERIFY operation is the following series | ||||
| of operations. With this, the client is attempting to verify that the file | ||||
| being removed will match what the client expects to be removed. This | ||||
| series can help prevent the unintended deletion of a file. | ||||
| </t> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| PUTFH (directory filehandle) | ||||
| LOOKUP (file name) | ||||
| VERIFY (filehandle == fh) | ||||
| PUTFH (directory filehandle) | ||||
| REMOVE (file name)]]></sourcecode> | ||||
| <t> | ||||
| This series does not prevent a second client from removing and | ||||
| creating a new file in the middle of this sequence, but it does help | ||||
| avoid the unintended result. | ||||
| </t> | ||||
| <t> | ||||
| In the case that a <bcp14>RECOMMENDED</bcp14> attribute is specified in the VERIFY | ||||
| operation and the server does not support that attribute for the | ||||
| file system object, the error NFS4ERR_ATTRNOTSUPP is returned to the | ||||
| client. | ||||
| </t> | ||||
| <t> | ||||
| When the attribute rdattr_error or any set-only attribute (e.g., | ||||
| time_modify_set) is specified, the error NFS4ERR_INVAL is returned to | ||||
| the client. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_WRITE" numbered="true" toc="default"> | ||||
| <name>Operation 38: WRITE - Write to File</name> | ||||
| <section toc="exclude" anchor="OP_WRITE_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| enum stable_how4 { | ||||
| UNSTABLE4 = 0, | ||||
| DATA_SYNC4 = 1, | ||||
| FILE_SYNC4 = 2 | ||||
| }; | ||||
| struct WRITE4args { | ||||
| /* CURRENT_FH: file */ | ||||
| stateid4 stateid; | ||||
| offset4 offset; | ||||
| stable_how4 stable; | ||||
| opaque data<>; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_WRITE_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct WRITE4resok { | ||||
| count4 count; | ||||
| stable_how4 committed; | ||||
| verifier4 writeverf; | ||||
| }; | ||||
| union WRITE4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| WRITE4resok resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_WRITE_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The WRITE operation is used to write data to a regular file. The | ||||
| target file is specified by the current filehandle. The offset | ||||
| specifies the offset where the data should be written. An offset of zero | ||||
| specifies that the write should start at the beginning of the | ||||
| file. The count, as encoded as part of the opaque data parameter, | ||||
| represents the number of bytes of data that are to be written. If the | ||||
| count is zero, the WRITE will succeed and return a count of zero subject to permissions checking. The server <bcp14>MAY</bcp14> | ||||
| write fewer bytes than requested by the client. | ||||
| </t> | ||||
| <t> | ||||
| The client specifies with the stable parameter the method | ||||
| of how the data is to be processed by the server. If stable is | ||||
| FILE_SYNC4, the server <bcp14>MUST</bcp14> commit the data written plus all | ||||
| file system metadata to stable storage before returning results. This | ||||
| corresponds to the NFSv2 protocol semantics. Any other | ||||
| behavior constitutes a protocol violation. If stable is DATA_SYNC4, | ||||
| then the server <bcp14>MUST</bcp14> commit all of the data to stable storage and | ||||
| enough of the metadata to retrieve the data before returning. The | ||||
| server implementor is free to implement DATA_SYNC4 in the same fashion | ||||
| as FILE_SYNC4, but with a possible performance drop. If stable is | ||||
| UNSTABLE4, the server is free to commit any part of the data and the | ||||
| metadata to stable storage, including all or none, before returning a | ||||
| reply to the client. There is no guarantee whether or when any | ||||
| uncommitted data will subsequently be committed to stable storage. The | ||||
| only guarantees made by the server are that it will not destroy any | ||||
| data without changing the value of writeverf and that it will not commit | ||||
| the data and metadata at a level less than that requested by the | ||||
| client. | ||||
| </t> | ||||
| <t> | ||||
| Except when special stateids are used, the | ||||
| stateid value for a WRITE request represents a value returned from | ||||
| a previous byte-range LOCK or OPEN request or the stateid | ||||
| associated with a delegation. The stateid identifies the associated | ||||
| owners if any and is | ||||
| used by the server to verify that the associated locks are still | ||||
| valid (e.g., have not been revoked). | ||||
| </t> | ||||
| <t> | ||||
| Upon successful completion, the following results are returned. The | ||||
| count result is the number of bytes of data written to the file. The | ||||
| server may write fewer bytes than requested. If so, the actual number | ||||
| of bytes written starting at location, offset, is returned. | ||||
| </t> | ||||
| <t> | ||||
| The server also returns an indication of the level of commitment of | ||||
| the data and metadata via committed. | ||||
| Per <xref target="stable_committed" format="default"/>, | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The server <bcp14>MAY</bcp14> commit the data at a stronger level | ||||
| than requested. | ||||
| </li> | ||||
| <li> | ||||
| The server <bcp14>MUST</bcp14> commit the data at a level at | ||||
| least as high as that committed. | ||||
| </li> | ||||
| </ul> | ||||
| <table anchor="stable_committed" align="center"> | ||||
| <name>Valid Combinations of the Fields Stable in the Request and Committed in the Reply</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">stable</th> | ||||
| <th align="left">committed</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">UNSTABLE4</td> | ||||
| <td align="left">FILE_SYNC4, DATA_SYNC4, UNSTABLE4</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">DATA_SYNC4</td> | ||||
| <td align="left">FILE_SYNC4, DATA_SYNC4</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">FILE_SYNC4</td> | ||||
| <td align="left">FILE_SYNC4</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> | ||||
| The final portion of the result is the field | ||||
| writeverf. This field is the write verifier and is a | ||||
| cookie that the client can use to determine whether | ||||
| a server has changed instance state (e.g., server | ||||
| restart) between a call to WRITE and a subsequent | ||||
| call to either WRITE or COMMIT. This cookie <bcp14>MUST</bcp14> be | ||||
| unchanged during a single instance of the NFSv4.1 | ||||
| server and <bcp14>MUST</bcp14> be unique between instances of the | ||||
| NFSv4.1 server. If the cookie changes, then the | ||||
| client <bcp14>MUST</bcp14> assume that any data written with an | ||||
| UNSTABLE4 value for committed and an old writeverf in the reply | ||||
| has been lost and will need to be recovered. | ||||
| </t> | ||||
| <t> | ||||
| If a client writes data to the server with the stable argument set to | ||||
| UNSTABLE4 and the reply yields a committed response of DATA_SYNC4 or | ||||
| UNSTABLE4, the client will follow up some time in the future with a | ||||
| COMMIT operation to synchronize outstanding asynchronous data and | ||||
| metadata with the server's stable storage, barring client error. It is | ||||
| possible that due to client crash or other error that a subsequent | ||||
| COMMIT will not be received by the server. | ||||
| </t> | ||||
| <t> | ||||
| For a WRITE with a stateid value of all bits equal to zero, the server <bcp14>MAY</bcp14> allow | ||||
| the WRITE to be serviced subject to mandatory byte-range locks or the | ||||
| current share deny modes for the file. For a WRITE with a stateid | ||||
| value of all bits equal to 1, the server <bcp14>MUST NOT</bcp14> allow the WRITE operation to | ||||
| bypass locking checks at the server and otherwise is | ||||
| treated as if a stateid of all bits equal to zero were used. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_WRITE_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| It is possible for the server to write fewer bytes of data than | ||||
| requested by the client. In this case, the server <bcp14>SHOULD NOT</bcp14> return | ||||
| an error unless no data was written at all. If the server writes less | ||||
| than the number of bytes specified, the client will need to send another | ||||
| WRITE to write the remaining data. | ||||
| </t> | ||||
| <t> | ||||
| It is assumed that the act of writing data to | ||||
| a file will cause the time_modified and change | ||||
| attributes of the file to be updated. However, | ||||
| these attributes <bcp14>SHOULD NOT</bcp14> be changed | ||||
| unless the contents of the file are changed. Thus, | ||||
| a WRITE request with count set to zero <bcp14>SHOULD NOT</bcp14> cause | ||||
| the time_modified and change attributes of the file to be updated. | ||||
| </t> | ||||
| <t> | ||||
| Stable storage is persistent storage that survives: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| Repeated power failures. | ||||
| </li> | ||||
| <li> | ||||
| Hardware failures (of any board, power supply, etc.). | ||||
| </li> | ||||
| <li> | ||||
| Repeated software crashes and restarts. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| This definition does not address failure of the stable storage module | ||||
| itself. | ||||
| </t> | ||||
| <t> | ||||
| The verifier is defined to allow a client to detect | ||||
| different instances of an NFSv4.1 protocol server | ||||
| over which cached, uncommitted data may be lost. In | ||||
| the most likely case, the verifier allows the client | ||||
| to detect server restarts. This information is | ||||
| required so that the client can safely determine | ||||
| whether the server could have lost cached data. | ||||
| If the server fails unexpectedly and the client has | ||||
| uncommitted data from previous WRITE requests (done | ||||
| with the stable argument set to UNSTABLE4 and in | ||||
| which the result committed was returned as UNSTABLE4 | ||||
| as well), the server might not have flushed cached | ||||
| data to stable storage. The burden of recovery is | ||||
| on the client, and the client will need to retransmit | ||||
| the data to the server. | ||||
| </t> | ||||
| <t> | ||||
| A suggested verifier would be to use the time that | ||||
| the server was last started (if restarting the server | ||||
| results in lost buffers). | ||||
| </t> | ||||
| <t> | ||||
| The reply's committed field allows the client to do more | ||||
| effective caching. If the server is committing all WRITE requests to | ||||
| stable storage, then it <bcp14>SHOULD</bcp14> return with committed set to FILE_SYNC4, | ||||
| regardless of the value of the stable field in the arguments. A server | ||||
| that uses an NVRAM accelerator may choose to implement this policy. | ||||
| The client can use this to increase the effectiveness of the cache by | ||||
| discarding cached data that has already been committed on the server. | ||||
| </t> | ||||
| <t> | ||||
| Some implementations may return NFS4ERR_NOSPC instead | ||||
| of NFS4ERR_DQUOT when a user's quota is exceeded. | ||||
| </t> | ||||
| <t> | ||||
| In the case that the current filehandle is of | ||||
| type NF4DIR, the server will return NFS4ERR_ISDIR. | ||||
| If the current file is a symbolic link, the error | ||||
| NFS4ERR_SYMLINK will be returned. Otherwise, if the | ||||
| current filehandle does not designate an ordinary | ||||
| file, the server will return NFS4ERR_WRONG_TYPE. | ||||
| </t> | ||||
| <t> | ||||
| If mandatory byte-range locking is in effect for the file, | ||||
| and the corresponding byte-range of the data to | ||||
| be written to the file is READ_LT or WRITE_LT locked by | ||||
| an owner that is not associated with the stateid, | ||||
| the server <bcp14>MUST</bcp14> return NFS4ERR_LOCKED. If so, | ||||
| the client <bcp14>MUST</bcp14> check if the owner corresponding | ||||
| to the stateid used with the WRITE operation has a | ||||
| conflicting READ_LT lock that overlaps with the byte-range | ||||
| that was to be written. If the stateid's owner has | ||||
| no conflicting READ_LT lock, then the client <bcp14>SHOULD</bcp14> try | ||||
| to get the appropriate write byte-range lock via the | ||||
| LOCK operation before re-attempting the WRITE. When | ||||
| the WRITE completes, the client <bcp14>SHOULD</bcp14> release the | ||||
| byte-range lock via LOCKU. | ||||
| </t> | ||||
| <t> | ||||
| If the stateid's owner had a conflicting READ_LT lock, then the client | ||||
| has no choice but to return an error to the application that attempted | ||||
| the WRITE. The reason is that since the stateid's owner had a READ_LT | ||||
| lock, either the server attempted to temporarily effectively upgrade | ||||
| this READ_LT lock to a WRITE_LT lock or the server has no upgrade | ||||
| capability. If the server attempted to upgrade the READ_LT lock and | ||||
| failed, it is pointless for the client to re-attempt the upgrade via | ||||
| the LOCK operation, because there might be another client also trying | ||||
| to upgrade. If two clients are blocked trying to upgrade the same lock, | ||||
| the clients deadlock. If the server has no upgrade capability, then | ||||
| it is pointless to try a LOCK operation to upgrade. | ||||
| </t> | ||||
| <t> | ||||
| If one or more other clients have delegations for the file being | ||||
| written, those delegations <bcp14>MUST</bcp14> be recalled, and the | ||||
| operation cannot proceed until those delegations are returned | ||||
| or revoked. Except where this | ||||
| happens very quickly, one or more NFS4ERR_DELAY errors will be | ||||
| returned to requests made while the delegation remains outstanding. | ||||
| Normally, delegations will not be recalled as a result of a WRITE | ||||
| operation since the recall will occur as a result of an earlier | ||||
| OPEN. However, since it is possible for a WRITE to be done with | ||||
| a special stateid, the server needs to check for this case even | ||||
| though the client should have done an OPEN previously. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_BACKCHANNEL_CTL" numbered="true" toc="default"> | ||||
| <name>Operation 40: BACKCHANNEL_CTL - Backchannel Control</name> | ||||
| <section toc="exclude" anchor="OP_BACKCHANNEL_CTL_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| typedef opaque gsshandle4_t<>; | ||||
| struct gss_cb_handles4 { | ||||
| rpc_gss_svc_t gcbp_service; /* RFC 2203 */ | ||||
| gsshandle4_t gcbp_handle_from_server; | ||||
| gsshandle4_t gcbp_handle_from_client; | ||||
| }; | ||||
| union callback_sec_parms4 switch (uint32_t cb_secflavor) { | ||||
| case AUTH_NONE: | ||||
| void; | ||||
| case AUTH_SYS: | ||||
| authsys_parms cbsp_sys_cred; /* RFC 1831 */ | ||||
| case RPCSEC_GSS: | ||||
| gss_cb_handles4 cbsp_gss_handles; | ||||
| }; | ||||
| struct BACKCHANNEL_CTL4args { | ||||
| uint32_t bca_cb_program; | ||||
| callback_sec_parms4 bca_sec_parms<>; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_BACKCHANNEL_CTL_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct BACKCHANNEL_CTL4res { | ||||
| nfsstat4 bcr_status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_BACKCHANNEL_CTL_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The BACKCHANNEL_CTL operation replaces the | ||||
| backchannel's callback program number and adds | ||||
| (not replaces) RPCSEC_GSS handles for use by the | ||||
| backchannel. | ||||
| </t> | ||||
| <t> | ||||
| The arguments of the BACKCHANNEL_CTL call are | ||||
| a subset of the CREATE_SESSION parameters. | ||||
| In the arguments of BACKCHANNEL_CTL, the | ||||
| bca_cb_program field and bca_sec_parms fields | ||||
| correspond respectively to the csa_cb_program and | ||||
| csa_sec_parms fields of the arguments of CREATE_SESSION | ||||
| (<xref target="OP_CREATE_SESSION" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| BACKCHANNEL_CTL <bcp14>MUST</bcp14> appear in a COMPOUND that starts | ||||
| with SEQUENCE. | ||||
| </t> | ||||
| <t> | ||||
| If the RPCSEC_GSS handle identified by | ||||
| gcbp_handle_from_server does not exist on the server, | ||||
| the server <bcp14>MUST</bcp14> return NFS4ERR_NOENT. | ||||
| </t> | ||||
| <t> | ||||
| If an RPCSEC_GSS handle is using the SSV context (see <xref target="ssv_mech" format="default"/>), then because each SSV RPCSEC_GSS | ||||
| handle shares a common SSV GSS context, there are security | ||||
| considerations specific to this situation discussed in <xref target="rpcsec_ssv_consider" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_BIND_CONN_TO_SESSION" numbered="true" toc="default"> | ||||
| <name>Operation 41: BIND_CONN_TO_SESSION - Associate Connection with Session</name> | ||||
| <section toc="exclude" anchor="OP_BIND_CONN_TO_SESSION_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| enum channel_dir_from_client4 { | ||||
| CDFC4_FORE = 0x1, | ||||
| CDFC4_BACK = 0x2, | ||||
| CDFC4_FORE_OR_BOTH = 0x3, | ||||
| CDFC4_BACK_OR_BOTH = 0x7 | ||||
| }; | ||||
| struct BIND_CONN_TO_SESSION4args { | ||||
| sessionid4 bctsa_sessid; | ||||
| channel_dir_from_client4 | ||||
| bctsa_dir; | ||||
| bool bctsa_use_conn_in_rdma_mode; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_BIND_CONN_TO_SESSION_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| enum channel_dir_from_server4 { | ||||
| CDFS4_FORE = 0x1, | ||||
| CDFS4_BACK = 0x2, | ||||
| CDFS4_BOTH = 0x3 | ||||
| }; | ||||
| struct BIND_CONN_TO_SESSION4resok { | ||||
| sessionid4 bctsr_sessid; | ||||
| channel_dir_from_server4 | ||||
| bctsr_dir; | ||||
| bool bctsr_use_conn_in_rdma_mode; | ||||
| }; | ||||
| union BIND_CONN_TO_SESSION4res | ||||
| switch (nfsstat4 bctsr_status) { | ||||
| case NFS4_OK: | ||||
| BIND_CONN_TO_SESSION4resok | ||||
| bctsr_resok4; | ||||
| default: void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_BIND_CONN_TO_SESSION_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| BIND_CONN_TO_SESSION is used to associate additional connections with a | ||||
| session. It <bcp14>MUST</bcp14> be used on the connection being associated with the session. It <bcp14>MUST</bcp14> | ||||
| be the only operation in the COMPOUND procedure. If | ||||
| SP4_NONE (<xref target="OP_EXCHANGE_ID" format="default"/>) state protection | ||||
| is used, any principal, | ||||
| security flavor, or RPCSEC_GSS context <bcp14>MAY</bcp14> be used to invoke the operation. | ||||
| If SP4_MACH_CRED is used, RPCSEC_GSS <bcp14>MUST</bcp14> be used with the | ||||
| integrity or privacy services, using the principal that | ||||
| created the client ID. If SP4_SSV is used, RPCSEC_GSS with | ||||
| the SSV GSS mechanism (<xref target="ssv_mech" format="default"/>) and integrity or | ||||
| privacy <bcp14>MUST</bcp14> be used. | ||||
| </t> | ||||
| <t> | ||||
| If, when the client ID was created, the client opted for SP4_NONE | ||||
| state protection, | ||||
| the client is not required to use BIND_CONN_TO_SESSION to associate the | ||||
| connection with the session, unless | ||||
| the client wishes to associate the connection with the backchannel. | ||||
| When SP4_NONE protection is used, simply sending a COMPOUND | ||||
| request with a SEQUENCE operation is sufficient to associate the | ||||
| connection with the session specified in SEQUENCE. | ||||
| </t> | ||||
| <t> | ||||
| The field bctsa_dir indicates whether the client | ||||
| wants to associate the connection with the fore | ||||
| channel or the backchannel or both channels. The value | ||||
| CDFC4_FORE_OR_BOTH indicates that the client wants to | ||||
| associate the connection with both the fore channel and backchannel, | ||||
| but will accept the connection being associated to | ||||
| just the fore channel. The value CDFC4_BACK_OR_BOTH | ||||
| indicates that the client wants to associate with both | ||||
| the fore channel and backchannel, but will accept the | ||||
| connection being associated with just the backchannel. | ||||
| The server replies in bctsr_dir which channel(s) | ||||
| the connection is associated with. | ||||
| If the client specified CDFC4_FORE, the server | ||||
| <bcp14>MUST</bcp14> return CDFS4_FORE. If the client specified | ||||
| CDFC4_BACK, the server <bcp14>MUST</bcp14> return CDFS4_BACK. If the | ||||
| client specified CDFC4_FORE_OR_BOTH, the server <bcp14>MUST</bcp14> return | ||||
| CDFS4_FORE or CDFS4_BOTH. If the client specified | ||||
| CDFC4_BACK_OR_BOTH, the server <bcp14>MUST</bcp14> return CDFS4_BACK | ||||
| or CDFS4_BOTH. | ||||
| </t> | ||||
| <t> | ||||
| See the CREATE_SESSION operation (<xref target="OP_CREATE_SESSION" format="default"/>), | ||||
| and the description of the argument | ||||
| csa_use_conn_in_rdma_mode to understand | ||||
| bctsa_use_conn_in_rdma_mode, and the description of | ||||
| csr_use_conn_in_rdma_mode to understand bctsr_use_conn_in_rdma_mode. | ||||
| </t> | ||||
| <t> | ||||
| Invoking BIND_CONN_TO_SESSION on a connection already associated | ||||
| with the specified session has no effect, and the server <bcp14>MUST</bcp14> | ||||
| respond with NFS4_OK, unless the client is demanding changes | ||||
| to the set of channels the connection is associated with. If | ||||
| so, the server <bcp14>MUST</bcp14> return NFS4ERR_INVAL. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_BIND_CONN_TO_SESSION_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| If a session's channel loses all connections, depending on | ||||
| the client ID's state protection and type of channel, | ||||
| the client might need to use | ||||
| BIND_CONN_TO_SESSION to associate a new connection. If the | ||||
| server restarted and does not keep the reply cache in stable | ||||
| storage, the server will not recognize the session ID. | ||||
| The client will ultimately have to invoke EXCHANGE_ID to | ||||
| create a new client ID and session. | ||||
| </t> | ||||
| <t> | ||||
| Suppose SP4_SSV state protection is being used, | ||||
| and BIND_CONN_TO_SESSION is among the operations | ||||
| included in the spo_must_enforce set when the | ||||
| client ID was created (<xref target="OP_EXCHANGE_ID" format="default"/>). | ||||
| If so, there is an issue if SET_SSV is sent, no response | ||||
| is returned, and the last connection associated | ||||
| with the client ID drops. The client, per | ||||
| the sessions model, <bcp14>MUST</bcp14> retry the SET_SSV. But | ||||
| it needs a new connection to do so, and <bcp14>MUST</bcp14> | ||||
| associate that connection with the session via a | ||||
| BIND_CONN_TO_SESSION authenticated with the SSV | ||||
| GSS mechanism. The problem is that the RPCSEC_GSS | ||||
| message integrity codes use a subkey derived from the SSV as the | ||||
| key and the | ||||
| SSV may have changed. While there are multiple | ||||
| recovery strategies, a single, general strategy | ||||
| is described here. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The client reconnects. | ||||
| </li> | ||||
| <li> | ||||
| The client assumes that the SET_SSV was executed, | ||||
| and so sends BIND_CONN_TO_SESSION with the subkey (derived from | ||||
| the new SSV, i.e., what SET_SSV would have set the SSV to) | ||||
| used as the key for the RPCSEC_GSS credential message integrity codes. | ||||
| </li> | ||||
| <li> | ||||
| If the request succeeds, this means that the original attempted SET_SSV | ||||
| did execute successfully. The client re-sends the original | ||||
| SET_SSV, which the server will reply to via the | ||||
| reply cache. | ||||
| </li> | ||||
| <li> | ||||
| If the server returns an RPC authentication error, | ||||
| this means that the server's current SSV was not changed | ||||
| (and the SET_SSV was likely not executed). The client then | ||||
| tries BIND_CONN_TO_SESSION with the subkey derived from the | ||||
| old SSV as the | ||||
| key for the RPCSEC_GSS message integrity codes. | ||||
| </li> | ||||
| <li> | ||||
| The attempted BIND_CONN_TO_SESSION with the old SSV | ||||
| should succeed. If so, the client re-sends the original | ||||
| SET_SSV. If the original SET_SSV was not executed, then the | ||||
| server executes it. If the original SET_SSV was executed but | ||||
| failed, the server will return the SET_SSV from the reply | ||||
| cache. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_EXCHANGE_ID" numbered="true" toc="default"> | ||||
| <name>Operation 42: EXCHANGE_ID - Instantiate Client ID</name> | ||||
| <t> | ||||
| The EXCHANGE_ID operation exchanges long-hand client and server identifiers | ||||
| (owners) and provides access to a client ID, creating one | ||||
| if necessary. This client ID becomes associated with the connection | ||||
| on which the operation is done, so that it is available when a | ||||
| CREATE_SESSION is done or when the connection is used to issue | ||||
| a request | ||||
| on an existing session associated with the current client. | ||||
| </t> | ||||
| <section anchor="EXID-arg" toc="exclude" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const EXCHGID4_FLAG_SUPP_MOVED_REFER = 0x00000001; | ||||
| const EXCHGID4_FLAG_SUPP_MOVED_MIGR = 0x00000002; | ||||
| const EXCHGID4_FLAG_BIND_PRINC_STATEID = 0x00000100; | ||||
| const EXCHGID4_FLAG_USE_NON_PNFS = 0x00010000; | ||||
| const EXCHGID4_FLAG_USE_PNFS_MDS = 0x00020000; | ||||
| const EXCHGID4_FLAG_USE_PNFS_DS = 0x00040000; | ||||
| const EXCHGID4_FLAG_MASK_PNFS = 0x00070000; | ||||
| const EXCHGID4_FLAG_UPD_CONFIRMED_REC_A = 0x40000000; | ||||
| const EXCHGID4_FLAG_CONFIRMED_R = 0x80000000; | ||||
| struct state_protect_ops4 { | ||||
| bitmap4 spo_must_enforce; | ||||
| bitmap4 spo_must_allow; | ||||
| }; | ||||
| struct ssv_sp_parms4 { | ||||
| state_protect_ops4 ssp_ops; | ||||
| sec_oid4 ssp_hash_algs<>; | ||||
| sec_oid4 ssp_encr_algs<>; | ||||
| uint32_t ssp_window; | ||||
| uint32_t ssp_num_gss_handles; | ||||
| }; | ||||
| enum state_protect_how4 { | ||||
| SP4_NONE = 0, | ||||
| SP4_MACH_CRED = 1, | ||||
| SP4_SSV = 2 | ||||
| }; | ||||
| union state_protect4_a switch(state_protect_how4 spa_how) { | ||||
| case SP4_NONE: | ||||
| void; | ||||
| case SP4_MACH_CRED: | ||||
| state_protect_ops4 spa_mach_ops; | ||||
| case SP4_SSV: | ||||
| ssv_sp_parms4 spa_ssv_parms; | ||||
| }; | ||||
| struct EXCHANGE_ID4args { | ||||
| client_owner4 eia_clientowner; | ||||
| uint32_t eia_flags; | ||||
| state_protect4_a eia_state_protect; | ||||
| nfs_impl_id4 eia_client_impl_id<1>; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section anchor="EXID-res" toc="exclude" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct ssv_prot_info4 { | ||||
| state_protect_ops4 spi_ops; | ||||
| uint32_t spi_hash_alg; | ||||
| uint32_t spi_encr_alg; | ||||
| uint32_t spi_ssv_len; | ||||
| uint32_t spi_window; | ||||
| gsshandle4_t spi_handles<>; | ||||
| }; | ||||
| union state_protect4_r switch(state_protect_how4 spr_how) { | ||||
| case SP4_NONE: | ||||
| void; | ||||
| case SP4_MACH_CRED: | ||||
| state_protect_ops4 spr_mach_ops; | ||||
| case SP4_SSV: | ||||
| ssv_prot_info4 spr_ssv_info; | ||||
| }; | ||||
| struct EXCHANGE_ID4resok { | ||||
| clientid4 eir_clientid; | ||||
| sequenceid4 eir_sequenceid; | ||||
| uint32_t eir_flags; | ||||
| state_protect4_r eir_state_protect; | ||||
| server_owner4 eir_server_owner; | ||||
| opaque eir_server_scope<NFS4_OPAQUE_LIMIT>; | ||||
| nfs_impl_id4 eir_server_impl_id<1>; | ||||
| }; | ||||
| union EXCHANGE_ID4res switch (nfsstat4 eir_status) { | ||||
| case NFS4_OK: | ||||
| EXCHANGE_ID4resok eir_resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section anchor="OP_EXCHANGE_ID_DESCRIPTION" toc="exclude" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The client uses the EXCHANGE_ID operation to register | ||||
| a particular instance of that client with the server, | ||||
| as represented by a client_owner4. However, | ||||
| when the client_owner4 has already been registered | ||||
| by other means (e.g., Transparent State Migration), the | ||||
| client may still use EXCHANGE_ID to obtain the client ID | ||||
| assigned previously. | ||||
| </t> | ||||
| <t> | ||||
| The client ID returned from this | ||||
| operation will be associated with the connection | ||||
| on which the EXCHANGE_ID is received and | ||||
| will serve as a parent object for | ||||
| sessions created by the client on this connection or | ||||
| to which the connection is bound. As a result of using | ||||
| those sessions to make requests involving the creation | ||||
| of state, that state will become associated with the | ||||
| client ID returned. | ||||
| </t> | ||||
| <t> | ||||
| In situations in which the registration of the | ||||
| client_owner has not occurred previously, | ||||
| the client ID must first be used, along with | ||||
| the returned eir_sequenceid, in creating an | ||||
| associated session using | ||||
| CREATE_SESSION. | ||||
| </t> | ||||
| <t> | ||||
| If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the | ||||
| result, eir_flags, then it is an indication that the | ||||
| registration of the client_owner has already occurred | ||||
| and that a further CREATE_SESSION is not needed to | ||||
| confirm it. Of course, subsequent CREATE_SESSION | ||||
| operations may | ||||
| be needed for other reasons. | ||||
| </t> | ||||
| <t> | ||||
| The value eir_sequenceid is used to establish an initial | ||||
| sequence value associated with the client ID returned. In | ||||
| cases in which a CREATE_SESSION has already been done, | ||||
| there is no need for this value, since sequencing of | ||||
| such request has already been established, and the client | ||||
| has no need for this value and will ignore it. | ||||
| </t> | ||||
| <t> | ||||
| EXCHANGE_ID <bcp14>MAY</bcp14> be sent in a COMPOUND procedure that starts with | ||||
| SEQUENCE. However, when a client communicates with a server | ||||
| for the first time, it will not have a session, so using | ||||
| SEQUENCE will not be possible. | ||||
| If EXCHANGE_ID is sent without a preceding SEQUENCE, then it | ||||
| <bcp14>MUST</bcp14> be the only operation in the COMPOUND procedure's request. If | ||||
| it is not, the server <bcp14>MUST</bcp14> return NFS4ERR_NOT_ONLY_OP. | ||||
| </t> | ||||
| <t> | ||||
| The eia_clientowner field is composed of a co_verifier | ||||
| field and a co_ownerid string. As noted in | ||||
| <xref target="Client_Identifiers" format="default"/>, the co_ownerid | ||||
| identifies the client, and the co_verifier specifies a particular | ||||
| incarnation of that client. An EXCHANGE_ID | ||||
| sent with a new incarnation of the client will | ||||
| lead to the server removing lock state of the old | ||||
| incarnation. On the other hand, when an EXCHANGE_ID sent with the current | ||||
| incarnation and co_ownerid does not result in an unrelated error, | ||||
| it will potentially update an existing client ID's properties or | ||||
| simply return information about the existing client_id. The latter | ||||
| would happen when this operation is done to the same server | ||||
| using different network addresses as part of creating trunked | ||||
| connections. | ||||
| </t> | ||||
| <t> | ||||
| A server <bcp14>MUST NOT</bcp14> provide the same client ID to two different | ||||
| incarnations of an eia_clientowner. | ||||
| </t> | ||||
| <t> | ||||
| In addition to the client ID and sequence ID, the server | ||||
| returns a server owner (eir_server_owner) and | ||||
| server scope (eir_server_scope). The former field is used | ||||
| in connection with | ||||
| network trunking as described in <xref target="Trunking" format="default"/>. The latter field is used to | ||||
| allow clients to determine when client IDs sent by | ||||
| one server may be recognized by another in the event | ||||
| of file system migration (see <xref target="SEC11-EFF-lock" format="default"/> of the current document). | ||||
| </t> | ||||
| <t> | ||||
| The client ID returned by EXCHANGE_ID is only unique | ||||
| relative to the combination of eir_server_owner.so_major_id | ||||
| and eir_server_scope. Thus, if two servers return the | ||||
| same client ID, the onus is on the client to | ||||
| distinguish the client IDs on the basis of eir_server_owner.so_major_id | ||||
| and eir_server_scope. In the event two different servers | ||||
| claim matching server_owner.so_major_id and eir_server_scope, | ||||
| the client can use the verification techniques discussed | ||||
| in <xref target="PREP-trunk-verify" format="default"/> to determine if the servers | ||||
| are distinct. If they are distinct, then the client | ||||
| will need to note the destination network addresses | ||||
| of the connections used with each server and use | ||||
| the network address as the final discriminator. | ||||
| </t> | ||||
| <t> | ||||
| The server, as defined by the unique identity expressed | ||||
| in the so_major_id of the server owner and the server scope, | ||||
| needs to track several properties of each client ID it | ||||
| hands out. The properties apply to the client ID and all | ||||
| sessions associated with the client ID. | ||||
| The properties are derived from the | ||||
| arguments and results of EXCHANGE_ID. | ||||
| The client ID properties include: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| The capabilities expressed by the following bits, which | ||||
| come from the results of EXCHANGE_ID: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li>EXCHGID4_FLAG_SUPP_MOVED_REFER</li> | ||||
| <li>EXCHGID4_FLAG_SUPP_MOVED_MIGR </li> | ||||
| <li>EXCHGID4_FLAG_BIND_PRINC_STATEID </li> | ||||
| <li>EXCHGID4_FLAG_USE_NON_PNFS </li> | ||||
| <li>EXCHGID4_FLAG_USE_PNFS_MDS </li> | ||||
| <li>EXCHGID4_FLAG_USE_PNFS_DS </li> | ||||
| </ul> | ||||
| <t> | ||||
| These properties may be updated by subsequent | ||||
| EXCHANGE_ID operations on confirmed client IDs though the server <bcp14>MAY</bcp14> | ||||
| refuse to change them. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| The state protection method used, one of SP4_NONE, | ||||
| SP4_MACH_CRED, or SP4_SSV, as set by the spa_how | ||||
| field of the arguments to EXCHANGE_ID. Once the | ||||
| client ID is confirmed, this property cannot be | ||||
| updated by subsequent EXCHANGE_ID operations. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| For SP4_MACH_CRED or SP4_SSV state protection: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The list of operations (spo_must_enforce) that <bcp14>MUST</bcp14> use the specified | ||||
| state protection. This list comes | ||||
| from the results of EXCHANGE_ID. | ||||
| </li> | ||||
| <li> | ||||
| The list of operations (spo_must_allow) that <bcp14>MAY</bcp14> use the specified | ||||
| state protection. This list comes | ||||
| from the results of EXCHANGE_ID. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Once the client ID is confirmed, these properties | ||||
| cannot be updated by subsequent EXCHANGE_ID | ||||
| requests. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| For SP4_SSV protection: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The OID of the hash algorithm. This property is | ||||
| represented by one of the algorithms in the | ||||
| ssp_hash_algs field of the EXCHANGE_ID arguments. | ||||
| Once the client ID is confirmed, this property | ||||
| cannot be updated by subsequent EXCHANGE_ID | ||||
| requests. | ||||
| </li> | ||||
| <li> | ||||
| The OID of the encryption algorithm. This property | ||||
| is represented by one of the algorithms in the | ||||
| ssp_encr_algs field of the EXCHANGE_ID arguments. | ||||
| Once the client ID is confirmed, this property | ||||
| cannot be updated by subsequent EXCHANGE_ID | ||||
| requests. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The length of the SSV. This property is | ||||
| represented by the spi_ssv_len field in the EXCHANGE_ID | ||||
| results. | ||||
| Once the client ID is confirmed, | ||||
| this property cannot be updated by | ||||
| subsequent EXCHANGE_ID operations. | ||||
| </t> | ||||
| <t> | ||||
| There are <bcp14>REQUIRED</bcp14> and <bcp14>RECOMMENDED</bcp14> relationships among the | ||||
| length of the key of the encryption algorithm ("key length"), the length of the | ||||
| output of hash algorithm ("hash length"), and the length of the SSV ("SSV length"). | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| key length <bcp14>MUST</bcp14> be <= hash length. This is because the keys used for | ||||
| the encryption algorithm are actually subkeys derived from the SSV, | ||||
| and the derivation is via the hash algorithm. The selection of an | ||||
| encryption algorithm with a key length that exceeded the length of | ||||
| the output of the hash algorithm would require padding, and thus | ||||
| weaken the use of the encryption algorithm. | ||||
| </li> | ||||
| <li> | ||||
| hash length <bcp14>SHOULD</bcp14> be <= SSV length. This is because the | ||||
| SSV is a key used to derive subkeys via an HMAC, and | ||||
| it is recommended that the key used as input to an HMAC be | ||||
| at least as long as the length of the HMAC's hash algorithm's | ||||
| output (see <xref target="RFC2104" sectionFormat="of" section="3"/>). | ||||
| </li> | ||||
| <li> | ||||
| key length <bcp14>SHOULD</bcp14> be <= SSV length. This is a transitive result of the | ||||
| above two invariants. | ||||
| </li> | ||||
| <li> | ||||
| key length <bcp14>SHOULD</bcp14> be >= hash length / 2. This is because the subkey | ||||
| derivation is via | ||||
| an HMAC and it is recommended that if the HMAC has to be truncated, | ||||
| it should not be truncated to less than half the hash length | ||||
| (see Section <xref target="RFC2104" sectionFormat="bare" section="4"/> | ||||
| of RFC 2104 <xref target="RFC2104" format="default"/>). | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| Number of concurrent versions of the SSV the client | ||||
| and server will support (see <xref target="ssv_mech" format="default"/>). | ||||
| This property is represented by spi_window | ||||
| in the EXCHANGE_ID results. The property may be | ||||
| updated by subsequent EXCHANGE_ID operations. | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| The client's implementation ID as represented by | ||||
| the eia_client_impl_id field of the arguments. | ||||
| The property may be updated by subsequent EXCHANGE_ID | ||||
| requests. | ||||
| </li> | ||||
| <li> | ||||
| The server's implementation ID as represented by | ||||
| the eir_server_impl_id field of the reply. | ||||
| The property may be updated by replies to subsequent EXCHANGE_ID | ||||
| requests. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The eia_flags passed as part of the arguments and | ||||
| the eir_flags results allow the client and server | ||||
| to inform each other of their capabilities as well | ||||
| as indicate how the client ID will be used. Whether | ||||
| a bit is set or cleared on the arguments' flags | ||||
| does not force the server to set or clear the same | ||||
| bit on the results' side. Bits not defined above | ||||
| cannot be set in the eia_flags field. If they | ||||
| are, the server <bcp14>MUST</bcp14> reject the operation with | ||||
| NFS4ERR_INVAL. | ||||
| </t> | ||||
| <t> | ||||
| The EXCHGID4_FLAG_UPD_CONFIRMED_REC_A bit can only be set | ||||
| in eia_flags; it is always off in eir_flags. | ||||
| The EXCHGID4_FLAG_CONFIRMED_R bit can only be set in | ||||
| eir_flags; it is always off in eia_flags. If the | ||||
| server recognizes the co_ownerid and co_verifier | ||||
| as mapping to a confirmed client ID, it sets | ||||
| EXCHGID4_FLAG_CONFIRMED_R in eir_flags. | ||||
| The EXCHGID4_FLAG_CONFIRMED_R flag allows a client | ||||
| to tell if the client ID it is trying to create | ||||
| already exists and is confirmed. | ||||
| </t> | ||||
| <t> | ||||
| If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set in eia_flags, | ||||
| this means that the client is attempting to update properties | ||||
| of an existing confirmed client ID (if the client wants to | ||||
| update properties of an unconfirmed client ID, it <bcp14>MUST NOT</bcp14> | ||||
| set EXCHGID4_FLAG_UPD_CONFIRMED_REC_A). | ||||
| If so, it is | ||||
| <bcp14>RECOMMENDED</bcp14> that the client send the update EXCHANGE_ID | ||||
| operation in the same COMPOUND as a SEQUENCE so that | ||||
| the EXCHANGE_ID is executed exactly once. Whether | ||||
| the client can update the properties of client ID | ||||
| depends on the state protection it selected when the | ||||
| client ID was created, and the principal and security | ||||
| flavor it used when sending the EXCHANGE_ID operation. | ||||
| The situations described in items | ||||
| <xref target="case_update" format="counter"/>, | ||||
| <xref target="case_update_noent" format="counter"/>, | ||||
| <xref target="case_update_exist" format="counter"/>, | ||||
| or | ||||
| <xref target="case_update_perm" format="counter"/> | ||||
| of the second numbered list of <xref target="OP_EXCHANGE_ID_IMPLEMENTATION" format="default"/> below will apply. | ||||
| Note that if the operation succeeds | ||||
| and returns a client ID that is already | ||||
| confirmed, the server <bcp14>MUST</bcp14> set the | ||||
| EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. | ||||
| </t> | ||||
| <t> | ||||
| If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, | ||||
| this means that the client is trying to establish a new | ||||
| client ID; it is | ||||
| attempting to trunk data communication to | ||||
| the server (See <xref target="Trunking" format="default"/>); or it | ||||
| is attempting to update properties of an unconfirmed | ||||
| client ID. The | ||||
| situations described in | ||||
| items | ||||
| <xref target="case_new_owner_id" format="counter"/>, | ||||
| <xref target="case_non_update" format="counter"/>, | ||||
| <xref target="case_client_collision" format="counter"/>, | ||||
| <xref target="case_retry" format="counter"/>, or | ||||
| <xref target="case_client_restart" format="counter"/> | ||||
| of the second numbered list of <xref target="OP_EXCHANGE_ID_IMPLEMENTATION" format="default"/> below will apply. | ||||
| Note that if the operation succeeds | ||||
| and returns a client ID that was previously | ||||
| confirmed, the server <bcp14>MUST</bcp14> set the | ||||
| EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. | ||||
| </t> | ||||
| <t> | ||||
| When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit | ||||
| is set, the client indicates that it is capable | ||||
| of dealing with an NFS4ERR_MOVED error as part of | ||||
| a referral sequence. When this bit is not set, it | ||||
| is still legal for the server to perform a referral | ||||
| sequence. However, a server may use the fact that | ||||
| the client is incapable of correctly responding | ||||
| to a referral, by avoiding it for that particular | ||||
| client. It may, for instance, act as a proxy | ||||
| for that particular file system, at some cost in | ||||
| performance, although it is not obligated to do so. | ||||
| If the server will potentially perform a referral, it | ||||
| <bcp14>MUST</bcp14> set EXCHGID4_FLAG_SUPP_MOVED_REFER in eir_flags. | ||||
| </t> | ||||
| <t> | ||||
| When the EXCHGID4_FLAG_SUPP_MOVED_MIGR is set, | ||||
| the client indicates that it is capable of dealing | ||||
| with an NFS4ERR_MOVED error as part of a file system | ||||
| migration sequence. When this bit is not set, it | ||||
| is still legal for the server to indicate that a | ||||
| file system has moved, when this in fact happens. | ||||
| However, a server may use the fact that the client | ||||
| is incapable of correctly responding to a migration | ||||
| in its scheduling of file systems to migrate so as to | ||||
| avoid migration of file systems being actively used. | ||||
| It may also hide actual migrations from clients | ||||
| unable to deal with them by acting as a proxy for a | ||||
| migrated file system for particular clients, at some | ||||
| cost in performance, although it is not obligated | ||||
| to do so. If the server will potentially perform a | ||||
| migration, it <bcp14>MUST</bcp14> set EXCHGID4_FLAG_SUPP_MOVED_MIGR | ||||
| in eir_flags. | ||||
| </t> | ||||
| <t> | ||||
| When EXCHGID4_FLAG_BIND_PRINC_STATEID is set, the | ||||
| client indicates that it wants the server to bind the | ||||
| stateid to the principal. This means that when a | ||||
| principal creates a stateid, it has to be the one to | ||||
| use the stateid. If the server will perform binding, | ||||
| it will return EXCHGID4_FLAG_BIND_PRINC_STATEID. The | ||||
| server <bcp14>MAY</bcp14> return EXCHGID4_FLAG_BIND_PRINC_STATEID | ||||
| even if the client does not request it. If | ||||
| an update to the client ID changes the value | ||||
| of EXCHGID4_FLAG_BIND_PRINC_STATEID's client | ||||
| ID property, the effect applies only to new | ||||
| stateids. Existing stateids (and all stateids with | ||||
| the same "other" field) that were created with | ||||
| stateid to principal binding in force will continue | ||||
| to have binding in force. Existing stateids (and all | ||||
| stateids with the same "other" field) that were created | ||||
| with stateid to principal not in force will continue | ||||
| to have binding not in force. | ||||
| </t> | ||||
| <t> | ||||
| The EXCHGID4_FLAG_USE_NON_PNFS, | ||||
| EXCHGID4_FLAG_USE_PNFS_MDS, and | ||||
| EXCHGID4_FLAG_USE_PNFS_DS bits are described in | ||||
| <xref target="pnfs_session_stuff"/> | ||||
| and convey roles the | ||||
| client ID is to be used for in a pNFS environment. | ||||
| The server <bcp14>MUST</bcp14> set one of the acceptable combinations | ||||
| of these bits (roles) in eir_flags, as specified in that | ||||
| section. | ||||
| Note that the same client owner/server owner pair can | ||||
| have multiple roles. Multiple roles can be associated | ||||
| with the same client ID or with different client | ||||
| IDs. Thus, if a client sends EXCHANGE_ID from the | ||||
| same client owner to the same server owner multiple | ||||
| times, but specifies different pNFS roles each time, | ||||
| the server might return different client IDs. Given | ||||
| that different pNFS roles might have different client | ||||
| IDs, the client may ask for different properties for | ||||
| each role/client ID. | ||||
| </t> | ||||
| <t> | ||||
| The spa_how field of the eia_state_protect field | ||||
| specifies how the client wants to protect its client, | ||||
| locking, and session states from unauthorized changes | ||||
| (<xref target="protect_state_change" format="default"/>): | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| SP4_NONE. The client does not request the NFSv4.1 server | ||||
| to enforce state protection. The NFSv4.1 server <bcp14>MUST NOT</bcp14> | ||||
| enforce state protection for the returned client ID. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| SP4_MACH_CRED. If spa_how is SP4_MACH_CRED, then | ||||
| the client <bcp14>MUST</bcp14> send the EXCHANGE_ID operation with RPCSEC_GSS | ||||
| as the security flavor, and with a service of | ||||
| RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. If SP4_MACH_CRED | ||||
| is specified, then the | ||||
| client wants to use an RPCSEC_GSS-based machine | ||||
| credential to protect its state. The server <bcp14>MUST</bcp14> note | ||||
| the principal the EXCHANGE_ID operation was sent | ||||
| with, and the GSS mechanism used. These notes | ||||
| collectively comprise the machine credential. | ||||
| </t> | ||||
| <t> | ||||
| After the client ID is confirmed, as long as the lease associated with | ||||
| the client ID is unexpired, a subsequent EXCHANGE_ID | ||||
| operation that uses the same eia_clientowner.co_owner | ||||
| as the first EXCHANGE_ID <bcp14>MUST</bcp14> also use the same | ||||
| machine credential as the first EXCHANGE_ID. The | ||||
| server returns the same client ID for | ||||
| the subsequent EXCHANGE_ID as that returned from | ||||
| the first EXCHANGE_ID. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| SP4_SSV. If spa_how is SP4_SSV, then | ||||
| the client <bcp14>MUST</bcp14> send the EXCHANGE_ID operation with RPCSEC_GSS | ||||
| as the security flavor, and with a service of | ||||
| RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. | ||||
| If SP4_SSV is specified, then | ||||
| the client wants to use the SSV to protect its state. | ||||
| The server records the credential used in the request | ||||
| as the machine credential (as defined above) for | ||||
| the eia_clientowner.co_owner. | ||||
| The CREATE_SESSION operation that | ||||
| confirms the client ID <bcp14>MUST</bcp14> use the same machine | ||||
| credential. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| When a client specifies SP4_MACH_CRED or SP4_SSV, | ||||
| it also provides two lists of operations (each | ||||
| expressed as a bitmap). The first list | ||||
| is spo_must_enforce and consists of those operations | ||||
| the client <bcp14>MUST</bcp14> send (subject to the server confirming the | ||||
| list of operations in the result of EXCHANGE_ID) with the | ||||
| machine credential (if SP4_MACH_CRED protection is | ||||
| specified) or the SSV-based credential (if SP4_SSV | ||||
| protection is used). The client <bcp14>MUST</bcp14> send the | ||||
| operations with RPCSEC_GSS credentials that specify | ||||
| the RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY | ||||
| security service. Typically, the first list of | ||||
| operations includes EXCHANGE_ID, CREATE_SESSION, | ||||
| DELEGPURGE, DESTROY_SESSION, BIND_CONN_TO_SESSION, | ||||
| and DESTROY_CLIENTID. The client <bcp14>SHOULD NOT</bcp14> specify | ||||
| in this list any operations that require a filehandle | ||||
| because the server's access policies <bcp14>MAY</bcp14> conflict with | ||||
| the client's choice, and thus the client would then be | ||||
| unable to access a subset of the server's namespace. | ||||
| </t> | ||||
| <t> | ||||
| Note that if SP4_SSV protection is specified, and | ||||
| the client indicates that CREATE_SESSION must be | ||||
| protected with SP4_SSV, because the SSV cannot exist | ||||
| without a confirmed client ID, the first CREATE_SESSION | ||||
| <bcp14>MUST</bcp14> instead be sent using the machine credential, | ||||
| and the server <bcp14>MUST</bcp14> accept the machine credential. | ||||
| </t> | ||||
| <t> | ||||
| There is a corresponding result, also called spo_must_enforce, | ||||
| of the operations for which the server will require SP4_MACH_CRED or | ||||
| SP4_SSV protection. Normally, the server's result | ||||
| equals the client's argument, but the result <bcp14>MAY</bcp14> be different. | ||||
| If the client requests one or more operations in | ||||
| the set { EXCHANGE_ID, CREATE_SESSION, | ||||
| DELEGPURGE, DESTROY_SESSION, BIND_CONN_TO_SESSION, | ||||
| DESTROY_CLIENTID }, then the result spo_must_enforce | ||||
| <bcp14>MUST</bcp14> include the operations the client requested from that set. | ||||
| </t> | ||||
| <t> | ||||
| If spo_must_enforce in the results has BIND_CONN_TO_SESSION | ||||
| set, then connection binding enforcement is enabled, and | ||||
| the client <bcp14>MUST</bcp14> use the machine (if SP4_MACH_CRED protection is used) | ||||
| or SSV (if SP4_SSV protection is used) credential on calls | ||||
| to BIND_CONN_TO_SESSION. | ||||
| </t> | ||||
| <t> | ||||
| The second list is spo_must_allow and consists of those | ||||
| operations | ||||
| the client wants to have the option of sending with the machine credential or | ||||
| the SSV-based credential, even if the object the | ||||
| operations are performed on is not owned by the | ||||
| machine or SSV credential. | ||||
| </t> | ||||
| <t> | ||||
| The corresponding result, also called | ||||
| spo_must_allow, consists of the operations the server | ||||
| will allow the client to use SP4_SSV or SP4_MACH_CRED | ||||
| credentials with. | ||||
| Normally, the server's result | ||||
| equals the client's argument, but the result <bcp14>MAY</bcp14> be different. | ||||
| </t> | ||||
| <t> | ||||
| The purpose of spo_must_allow is to allow clients to | ||||
| solve the following conundrum. Suppose the client ID | ||||
| is confirmed with EXCHGID4_FLAG_BIND_PRINC_STATEID, | ||||
| and it calls OPEN with the RPCSEC_GSS credentials of | ||||
| a normal user. Now suppose the user's credentials expire, | ||||
| and cannot be renewed (e.g., a Kerberos ticket granting ticket | ||||
| expires, and the user has logged off and will not be | ||||
| acquiring a new ticket granting ticket). The client will be | ||||
| unable to send CLOSE without the user's credentials, which is to | ||||
| say the client has to either leave the state on the server | ||||
| or re-send EXCHANGE_ID with a new verifier to | ||||
| clear all state, that is, unless the client includes | ||||
| CLOSE on the list of operations in spo_must_allow and the | ||||
| server agrees. | ||||
| </t> | ||||
| <t> | ||||
| The SP4_SSV protection parameters also have: | ||||
| </t> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>ssp_hash_algs:</dt> | ||||
| <dd><t> | ||||
| This is the set of algorithms the client supports | ||||
| for the purpose of computing the digests needed for | ||||
| the internal SSV GSS mechanism and for the SET_SSV | ||||
| operation. Each algorithm is specified as an object | ||||
| identifier (OID). The <bcp14>REQUIRED</bcp14> algorithms for a | ||||
| server are id-sha1, id-sha224, id-sha256, id-sha384, | ||||
| and id-sha512 <xref target="RFC4055" format="default"/>.</t> | ||||
| <t> | ||||
| Due to known weaknesses in id-sha1, it is <bcp14>RECOMMENDED</bcp14> | ||||
| that the client specify at least one | ||||
| algorithm within ssp_hash_algs other than id-sha1.</t> | ||||
| <t> | ||||
| The algorithm the server selects among the | ||||
| set is indicated in spi_hash_alg, a field of | ||||
| spr_ssv_prot_info. The field spi_hash_alg is an | ||||
| index into the array ssp_hash_algs. Because of | ||||
| known the weaknesses in id-sha1, it is <bcp14>RECOMMENDED</bcp14> that | ||||
| it not be selected by the server as long as ssp_hash_algs | ||||
| contains any other supported algorithm.</t> | ||||
| <t> | ||||
| If the server | ||||
| does not support any of the offered algorithms, | ||||
| it returns NFS4ERR_HASH_ALG_UNSUPP. | ||||
| If ssp_hash_algs is empty, the server <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_INVAL. </t> | ||||
| </dd> | ||||
| <dt>ssp_encr_algs:</dt> | ||||
| <dd> | ||||
| This is the set of algorithms the client supports for the | ||||
| purpose of providing privacy protection for the internal | ||||
| SSV GSS mechanism. Each algorithm is | ||||
| specified as an OID. | ||||
| The <bcp14>REQUIRED</bcp14> algorithm for a server is id-aes256-CBC. | ||||
| The <bcp14>RECOMMENDED</bcp14> algorithms are id-aes192-CBC and id-aes128-CBC | ||||
| <xref target="CSOR_AES" format="default"/>. The selected algorithm is | ||||
| returned in spi_encr_alg, an index into ssp_encr_algs. | ||||
| If the server | ||||
| does not support any of the offered algorithms, | ||||
| it returns NFS4ERR_ENCR_ALG_UNSUPP. | ||||
| If ssp_encr_algs is empty, the server <bcp14>MUST</bcp14> return NFS4ERR_INVAL. | ||||
| Note that due to previously stated requirements and recommendations | ||||
| on the relationships between key length and hash length, some | ||||
| combinations of <bcp14>RECOMMENDED</bcp14> and <bcp14>REQUIRED</bcp14> encryption algorithm and | ||||
| hash algorithm either <bcp14>SHOULD NOT</bcp14> or <bcp14>MUST NOT</bcp14> be used. | ||||
| <xref target="algtbl" format="default"/> summarizes the illegal and discouraged | ||||
| combinations. | ||||
| </dd> | ||||
| <dt>ssp_window:</dt> | ||||
| <dd> | ||||
| This is the number of SSV versions the client wants | ||||
| the server to maintain (i.e., each successful call to SET_SSV | ||||
| produces a new version of the SSV). If ssp_window is zero, the | ||||
| server <bcp14>MUST</bcp14> return NFS4ERR_INVAL. The server responds | ||||
| with spi_window, which <bcp14>MUST NOT</bcp14> exceed ssp_window and <bcp14>MUST</bcp14> | ||||
| be at least one. | ||||
| Any requests on the backchannel or fore channel that | ||||
| are using a version of the SSV that is outside the window will fail with | ||||
| an ONC RPC authentication error, and the requester | ||||
| will have to retry them with the same slot ID and | ||||
| sequence ID. | ||||
| </dd> | ||||
| <dt>ssp_num_gss_handles:</dt> | ||||
| <dd> | ||||
| <t> | ||||
| This is the number of RPCSEC_GSS handles the | ||||
| server should create that are based on the GSS | ||||
| SSV mechanism (see | ||||
| <xref target="ssv_mech" format="default"/>). | ||||
| It is not the total number of RPCSEC_GSS handles for | ||||
| the client ID. Indeed, subsequent calls to EXCHANGE_ID | ||||
| will add RPCSEC_GSS handles. | ||||
| The server responds with a list of handles in | ||||
| spi_handles. If the client asks for at least | ||||
| one handle and the server cannot create it, | ||||
| the server <bcp14>MUST</bcp14> return an error. The handles in | ||||
| spi_handles are not available for use until the | ||||
| client ID is confirmed, which could be immediately | ||||
| if EXCHANGE_ID returns EXCHGID4_FLAG_CONFIRMED_R, | ||||
| or upon successful confirmation from CREATE_SESSION. | ||||
| </t> | ||||
| <t> | ||||
| While a client ID can span all the connections | ||||
| that are connected to a server sharing the same | ||||
| eir_server_owner.so_major_id, the RPCSEC_GSS | ||||
| handles returned in spi_handles can only be used | ||||
| on connections connected to a server that returns | ||||
| the same the eir_server_owner.so_major_id and | ||||
| eir_server_owner.so_minor_id on each connection. | ||||
| It is permissible for the client to set | ||||
| ssp_num_gss_handles to zero; the client can | ||||
| create more handles with another EXCHANGE_ID call. | ||||
| </t> | ||||
| <t> | ||||
| Because each SSV RPCSEC_GSS handle shares a common SSV GSS context, | ||||
| there are security considerations specific to this situation | ||||
| discussed in <xref target="rpcsec_ssv_consider" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| The seq_window (see Section <xref target="RFC2203" sectionFormat="bare" section="5.2.3.1"/> of RFC 2203 | ||||
| <xref target="RFC2203" format="default"/>) | ||||
| of each RPCSEC_GSS handle in spi_handle | ||||
| <bcp14>MUST</bcp14> be the same as the seq_window of | ||||
| the RPCSEC_GSS handle used for the credential of the RPC request | ||||
| of which the EXCHANGE_ID operation was sent as a part. | ||||
| </t> | ||||
| </dd> | ||||
| </dl> | ||||
| <table anchor="algtbl" align="center"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Encryption Algorithm</th> | ||||
| <th align="left"><bcp14>MUST NOT</bcp14> be combined with</th> | ||||
| <th align="left"><bcp14>SHOULD NOT</bcp14> be combined with</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">id-aes128-CBC</td> | ||||
| <td align="left"/> | ||||
| <td align="left">id-sha384, id-sha512</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">id-aes192-CBC</td> | ||||
| <td align="left">id-sha1</td> | ||||
| <td align="left">id-sha512</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">id-aes256-CBC</td> | ||||
| <td align="left">id-sha1, id-sha224</td> | ||||
| <td align="left"/> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> | ||||
| The arguments include an array of up to one | ||||
| element in length called eia_client_impl_id. If | ||||
| eia_client_impl_id is present, it contains the | ||||
| information identifying the implementation of the | ||||
| client. Similarly, the results include an array of up | ||||
| to one element in length called eir_server_impl_id | ||||
| that identifies the implementation of the server. | ||||
| Servers <bcp14>MUST</bcp14> accept a zero-length eia_client_impl_id | ||||
| array, and clients <bcp14>MUST</bcp14> accept a zero-length | ||||
| eir_server_impl_id array. | ||||
| </t> | ||||
| <t> | ||||
| A possible use for implementation identifiers | ||||
| would be in diagnostic software that extracts | ||||
| this information in an attempt to identify | ||||
| interoperability problems, performance workload | ||||
| behaviors, or general usage statistics. Since the | ||||
| intent of having access to this information is for | ||||
| planning or general diagnosis only, the client and | ||||
| server <bcp14>MUST NOT</bcp14> interpret this implementation | ||||
| identity information in a way that affects | ||||
| how the implementation interacts with | ||||
| its peer. The client and server are not | ||||
| allowed to depend on the peer's manifesting a particular | ||||
| allowed behavior based on an implementation identifier | ||||
| but are required to interoperate as specified elsewhere | ||||
| in the protocol specification. | ||||
| </t> | ||||
| <t> | ||||
| Because it is possible that some implementations might | ||||
| violate the protocol specification and interpret | ||||
| the identity information, implementations <bcp14>MUST</bcp14> | ||||
| provide facilities to allow the NFSv4 client and server | ||||
| to be configured to set the contents of the nfs_impl_id structures sent | ||||
| to any specified value. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="OP_EXCHANGE_ID_IMPLEMENTATION" toc="exclude" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| A server's client record is a 5-tuple: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| <t>co_ownerid: | ||||
| </t> | ||||
| <t> | ||||
| The client identifier string, from the eia_clientowner | ||||
| structure of the EXCHANGE_ID4args structure.</t> | ||||
| </li> | ||||
| <li> | ||||
| <t>co_verifier: | ||||
| </t> | ||||
| <t>A client-specific value used to indicate incarnations (where a client restart represents a new incarnation), from the | ||||
| eia_clientowner structure of the EXCHANGE_ID4args | ||||
| structure.</t> | ||||
| </li> | ||||
| <li> | ||||
| <t>principal: | ||||
| </t> | ||||
| <t> | ||||
| The principal that was defined in the RPC header's credential | ||||
| and/or verifier at the time the client record was | ||||
| established. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t>client ID: | ||||
| </t> | ||||
| <t>The shorthand client identifier, generated by the server and | ||||
| returned via the eir_clientid field in the EXCHANGE_ID4resok | ||||
| structure.</t> | ||||
| </li> | ||||
| <li> | ||||
| <t>confirmed: | ||||
| </t> | ||||
| <t>A private field on the server indicating whether or not a | ||||
| client record has been confirmed. A client record is | ||||
| confirmed if there has been a successful CREATE_SESSION | ||||
| operation to confirm it. Otherwise, it is unconfirmed. An | ||||
| unconfirmed record is established by an EXCHANGE_ID call. | ||||
| Any unconfirmed record that is not confirmed within a lease | ||||
| period <bcp14>SHOULD</bcp14> be removed.</t> | ||||
| </li> | ||||
| </ol> | ||||
| <!-- [auth] start new list --> | ||||
| <t> | ||||
| The following identifiers represent special values for the fields | ||||
| in the records. | ||||
| </t> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>ownerid_arg:</dt> | ||||
| <dd> | ||||
| The value of the eia_clientowner.co_ownerid subfield of the | ||||
| EXCHANGE_ID4args structure of the current request. | ||||
| </dd> | ||||
| <dt>verifier_arg:</dt> | ||||
| <dd> | ||||
| The value of the eia_clientowner.co_verifier subfield of the | ||||
| EXCHANGE_ID4args structure of the current request. | ||||
| </dd> | ||||
| <dt>old_verifier_arg:</dt> | ||||
| <dd> | ||||
| A value of the eia_clientowner.co_verifier field of a client record | ||||
| received in a previous request; this is distinct from | ||||
| verifier_arg. | ||||
| </dd> | ||||
| <dt>principal_arg:</dt> | ||||
| <dd> | ||||
| The value of the RPCSEC_GSS principal for the current request. | ||||
| </dd> | ||||
| <dt>old_principal_arg:</dt> | ||||
| <dd> | ||||
| A value of the principal of a client record as defined by the | ||||
| RPC header's credential or verifier of a previous request. | ||||
| This is distinct from principal_arg. | ||||
| </dd> | ||||
| <dt>clientid_ret:</dt> | ||||
| <dd> | ||||
| The value of the eir_clientid field the server will return in the | ||||
| EXCHANGE_ID4resok structure for the current request. | ||||
| </dd> | ||||
| <dt>old_clientid_ret:</dt> | ||||
| <dd> | ||||
| The value of the eir_clientid field the server returned in the | ||||
| EXCHANGE_ID4resok structure for a previous request. This | ||||
| is distinct from clientid_ret. | ||||
| </dd> | ||||
| <dt>confirmed:</dt> | ||||
| <dd> | ||||
| The client ID has been confirmed. | ||||
| </dd> | ||||
| <dt>unconfirmed:</dt> | ||||
| <dd> | ||||
| The client ID has not been confirmed. | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| Since EXCHANGE_ID is a non-idempotent operation, we must | ||||
| consider the possibility that retries occur as a result of a | ||||
| client restart, network partition, malfunctioning router, etc. | ||||
| Retries are identified by the value of the eia_clientowner field of | ||||
| EXCHANGE_ID4args, and the method for dealing with them is | ||||
| outlined in the scenarios below. | ||||
| </t> | ||||
| <t> | ||||
| The scenarios are described in terms of the | ||||
| client record(s) a server has for a given | ||||
| co_ownerid. Note that if the client ID | ||||
| was created specifying SP4_SSV state protection and | ||||
| EXCHANGE_ID as the one of the operations in spo_must_allow, | ||||
| then the server <bcp14>MUST</bcp14> authorize EXCHANGE_IDs with the SSV | ||||
| principal in addition to the principal that created the | ||||
| client ID. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li anchor="case_new_owner_id"> | ||||
| <t>New Owner ID | ||||
| </t> | ||||
| <t> | ||||
| If the server has no client records | ||||
| with eia_clientowner.co_ownerid matching | ||||
| ownerid_arg, and EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not | ||||
| set in the EXCHANGE_ID, then a new shorthand | ||||
| client ID (let us call it clientid_ret) | ||||
| is generated, and the following unconfirmed | ||||
| record is added to the server's state. | ||||
| </t> | ||||
| <t> | ||||
| { ownerid_arg, verifier_arg, principal_arg, clientid_ret, unconfirmed } | ||||
| </t> | ||||
| <t> | ||||
| Subsequently, the server returns clientid_ret. | ||||
| </t> | ||||
| </li> | ||||
| <li anchor="case_non_update"> | ||||
| <t>Non-Update on Existing Client ID</t> | ||||
| <t> | ||||
| If the server has the following confirmed record, and | ||||
| the request does not have | ||||
| EXCHGID4_FLAG_UPD_CONFIRMED_REC_A set, | ||||
| then the request is the result of a retried request due to a | ||||
| faulty router or lost connection, or | ||||
| the client is trying to determine if it can perform | ||||
| trunking. | ||||
| </t> | ||||
| <t> | ||||
| { ownerid_arg, verifier_arg, principal_arg, clientid_ret, confirmed } | ||||
| </t> | ||||
| <t> | ||||
| Since the record has been confirmed, the client | ||||
| must have received the server's reply from | ||||
| the initial EXCHANGE_ID request. Since the | ||||
| server has a confirmed record, and since | ||||
| EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, with the | ||||
| possible exception of eir_server_owner.so_minor_id, the | ||||
| server returns the same result it did when | ||||
| the client ID's properties were last updated | ||||
| (or if never updated, the result when the | ||||
| client ID was created). The confirmed record | ||||
| is unchanged. | ||||
| </t> | ||||
| </li> | ||||
| <li anchor="case_client_collision"> | ||||
| <t>Client Collision | ||||
| </t> | ||||
| <t> | ||||
| If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and | ||||
| if the server has the following confirmed | ||||
| record, then this request is likely the result | ||||
| of a chance collision between the values of | ||||
| the eia_clientowner.co_ownerid subfield of | ||||
| EXCHANGE_ID4args for two different clients. | ||||
| </t> | ||||
| <t> | ||||
| { ownerid_arg, *, old_principal_arg, old_clientid_ret, confirmed } | ||||
| </t> | ||||
| <t> | ||||
| If there is currently no state associated with old_clientid_ret, | ||||
| or if there is state but the lease has expired, then | ||||
| this case is effectively equivalent to the | ||||
| New Owner ID case of <xref target="case_new_owner_id" format="default"/>. | ||||
| The confirmed record is deleted, the old_clientid_ret and its | ||||
| lock state are deleted, | ||||
| a new shorthand client ID | ||||
| is generated, and the following unconfirmed | ||||
| record is added to the server's state. | ||||
| </t> | ||||
| <t> | ||||
| { ownerid_arg, verifier_arg, principal_arg, clientid_ret, unconfirmed } | ||||
| </t> | ||||
| <t> | ||||
| Subsequently, the server returns clientid_ret. | ||||
| </t> | ||||
| <t> | ||||
| If old_clientid_ret has an unexpired lease with state, then | ||||
| no state of old_clientid_ret is changed or deleted. | ||||
| The server returns NFS4ERR_CLID_INUSE | ||||
| to indicate that the client should | ||||
| retry with a different value for the | ||||
| eia_clientowner.co_ownerid subfield of | ||||
| EXCHANGE_ID4args. The client record is not changed.</t> | ||||
| </li> | ||||
| <li anchor="case_retry"> | ||||
| <t>Replacement of Unconfirmed Record | ||||
| </t> | ||||
| <t> | ||||
| If the EXCHGID4_FLAG_UPD_CONFIRMED_REC_A flag is not set, | ||||
| and the server has the following unconfirmed record, then | ||||
| the client is attempting EXCHANGE_ID again on an | ||||
| unconfirmed client ID, perhaps due to a retry, a client | ||||
| restart before client ID confirmation (i.e., | ||||
| before CREATE_SESSION was called), or | ||||
| some other reason. | ||||
| </t> | ||||
| <t> | ||||
| { ownerid_arg, *, *, old_clientid_ret, unconfirmed } | ||||
| </t> | ||||
| <t> | ||||
| It is possible that | ||||
| the properties of old_clientid_ret are | ||||
| different than those specified in the current | ||||
| EXCHANGE_ID. Whether or not the properties are being updated, | ||||
| to eliminate ambiguity, the server | ||||
| deletes the unconfirmed record, generates a | ||||
| new client ID (clientid_ret), and establishes | ||||
| the following unconfirmed record: | ||||
| </t> | ||||
| <t> | ||||
| { ownerid_arg, verifier_arg, principal_arg, clientid_ret, unconfirmed } | ||||
| </t> | ||||
| </li> | ||||
| <li anchor="case_client_restart"> | ||||
| <t>Client Restart</t> | ||||
| <t> | ||||
| If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and | ||||
| if the server has the following confirmed client record, then | ||||
| this request is likely from a previously confirmed client | ||||
| that has restarted. | ||||
| </t> | ||||
| <t> | ||||
| { ownerid_arg, old_verifier_arg, principal_arg, old_clientid_ret, confirmed } | ||||
| </t> | ||||
| <t> | ||||
| Since the previous incarnation of the same | ||||
| client will no longer be making requests, | ||||
| once the new client ID is confirmed by | ||||
| CREATE_SESSION, byte-range locks and share reservations | ||||
| should be released immediately rather than | ||||
| forcing the new incarnation to wait for | ||||
| the lease time on the previous incarnation | ||||
| to expire. Furthermore, session state should | ||||
| be removed since if the client had maintained | ||||
| that information across restart, this request | ||||
| would not have been sent. If the server | ||||
| supports neither the CLAIM_DELEGATE_PREV | ||||
| nor CLAIM_DELEG_PREV_FH | ||||
| claim types, associated delegations should be | ||||
| purged as well; otherwise, delegations are | ||||
| retained and recovery proceeds according to | ||||
| <xref target="delegation_recovery" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| After processing, clientid_ret is returned to the client and | ||||
| this client record is added: | ||||
| </t> | ||||
| <t> | ||||
| { ownerid_arg, verifier_arg, principal_arg, clientid_ret, unconfirmed } | ||||
| </t> | ||||
| <t> | ||||
| The previously described confirmed record | ||||
| continues to exist, and thus the same | ||||
| ownerid_arg exists in both a confirmed and | ||||
| unconfirmed state at the same time. The number | ||||
| of states can collapse to one once the server | ||||
| receives an applicable CREATE_SESSION or | ||||
| EXCHANGE_ID. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the server subsequently receives a successful | ||||
| CREATE_SESSION that confirms clientid_ret, | ||||
| then the server atomically destroys the | ||||
| confirmed record and makes the unconfirmed | ||||
| record confirmed as described in | ||||
| <xref target="OP_CREATE_SESSION_DESCRIPTION" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| If the server instead subsequently receives | ||||
| an EXCHANGE_ID with the client owner equal | ||||
| to ownerid_arg, one strategy is to simply | ||||
| delete the unconfirmed record, and process the | ||||
| EXCHANGE_ID as described in the entirety of | ||||
| <xref target="OP_EXCHANGE_ID_IMPLEMENTATION" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li anchor="case_update"> | ||||
| <t>Update | ||||
| </t> | ||||
| <t> | ||||
| If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the | ||||
| server has the following confirmed record, | ||||
| then this request is an attempt at an update. | ||||
| </t> | ||||
| <t> | ||||
| { ownerid_arg, verifier_arg, principal_arg, clientid_ret, confirmed } | ||||
| </t> | ||||
| <t> | ||||
| Since the record has been confirmed, the client must have | ||||
| received the server's reply from the initial EXCHANGE_ID | ||||
| request. The server allows the update, and the client record | ||||
| is left intact. | ||||
| </t> | ||||
| </li> | ||||
| <li anchor="case_update_noent"> | ||||
| <t>Update but No Confirmed Record | ||||
| </t> | ||||
| <t> | ||||
| If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the | ||||
| server has no confirmed record corresponding ownerid_arg, | ||||
| then the server returns NFS4ERR_NOENT and leaves any unconfirmed | ||||
| record intact. | ||||
| </t> | ||||
| </li> | ||||
| <li anchor="case_update_exist"> | ||||
| <t>Update but Wrong Verifier | ||||
| </t> | ||||
| <t> | ||||
| If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the | ||||
| server has the following confirmed record, | ||||
| then this request is an illegal attempt at an | ||||
| update, perhaps because of a retry from a previous client | ||||
| incarnation. | ||||
| </t> | ||||
| <t> | ||||
| { ownerid_arg, old_verifier_arg, *, clientid_ret, confirmed } | ||||
| </t> | ||||
| <t> | ||||
| The server returns NFS4ERR_NOT_SAME and leaves the client record | ||||
| intact. | ||||
| </t> | ||||
| </li> | ||||
| <li anchor="case_update_perm"> | ||||
| <t>Update but Wrong Principal | ||||
| </t> | ||||
| <t> | ||||
| If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the | ||||
| server has the following confirmed record, | ||||
| then this request is an illegal attempt at an | ||||
| update by an unauthorized principal. | ||||
| </t> | ||||
| <t> | ||||
| { ownerid_arg, verifier_arg, old_principal_arg, clientid_ret, confirmed } | ||||
| </t> | ||||
| <t> | ||||
| The server returns NFS4ERR_PERM and leaves the client record | ||||
| intact. | ||||
| </t> | ||||
| </li> | ||||
| </ol> | ||||
| </section> | ||||
| </section> | ||||
| <!-- $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CREATE_SESSION" numbered="true" toc="default"> | ||||
| <name>Operation 43: CREATE_SESSION - Create New Session and Confirm Client ID</name> | ||||
| <section toc="exclude" anchor="OP_CREATE_SESSION_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct channel_attrs4 { | ||||
| count4 ca_headerpadsize; | ||||
| count4 ca_maxrequestsize; | ||||
| count4 ca_maxresponsesize; | ||||
| count4 ca_maxresponsesize_cached; | ||||
| count4 ca_maxoperations; | ||||
| count4 ca_maxrequests; | ||||
| uint32_t ca_rdma_ird<1>; | ||||
| }; | ||||
| const CREATE_SESSION4_FLAG_PERSIST = 0x00000001; | ||||
| const CREATE_SESSION4_FLAG_CONN_BACK_CHAN = 0x00000002; | ||||
| const CREATE_SESSION4_FLAG_CONN_RDMA = 0x00000004; | ||||
| struct CREATE_SESSION4args { | ||||
| clientid4 csa_clientid; | ||||
| sequenceid4 csa_sequence; | ||||
| uint32_t csa_flags; | ||||
| channel_attrs4 csa_fore_chan_attrs; | ||||
| channel_attrs4 csa_back_chan_attrs; | ||||
| uint32_t csa_cb_program; | ||||
| callback_sec_parms4 csa_sec_parms<>; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CREATE_SESSION_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CREATE_SESSION4resok { | ||||
| sessionid4 csr_sessionid; | ||||
| sequenceid4 csr_sequence; | ||||
| uint32_t csr_flags; | ||||
| channel_attrs4 csr_fore_chan_attrs; | ||||
| channel_attrs4 csr_back_chan_attrs; | ||||
| }; | ||||
| union CREATE_SESSION4res switch (nfsstat4 csr_status) { | ||||
| case NFS4_OK: | ||||
| CREATE_SESSION4resok csr_resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CREATE_SESSION_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation is used by the client to create new session objects | ||||
| on the server. | ||||
| </t> | ||||
| <t> | ||||
| CREATE_SESSION can be sent with or without a preceding SEQUENCE | ||||
| operation in the same COMPOUND procedure. | ||||
| If CREATE_SESSION is sent with a preceding SEQUENCE | ||||
| operation, | ||||
| any session created by CREATE_SESSION has no direct | ||||
| relation to the session specified in the SEQUENCE operation, although | ||||
| the two sessions might be associated with the same client ID. | ||||
| If CREATE_SESSION is sent without a preceding SEQUENCE, then it | ||||
| <bcp14>MUST</bcp14> be the only operation in the COMPOUND procedure's request. If | ||||
| it is not, the server <bcp14>MUST</bcp14> return NFS4ERR_NOT_ONLY_OP. | ||||
| </t> | ||||
| <t> | ||||
| In addition to creating a session, CREATE_SESSION has the following | ||||
| effects: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The first session created with a new | ||||
| client ID serves to confirm the | ||||
| creation of that | ||||
| client's state on the server. The server returns the parameter | ||||
| values for the new session. | ||||
| </li> | ||||
| <li> | ||||
| The connection CREATE_SESSION that is sent over is associated with the | ||||
| session's fore channel. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The arguments and results of CREATE_SESSION are described as follows: | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>csa_clientid:</dt> | ||||
| <dd> | ||||
| This is the client ID with which the new session will be associated. | ||||
| The corresponding result is csr_sessionid, the session ID | ||||
| of the new session. | ||||
| </dd> | ||||
| <dt>csa_sequence:</dt> | ||||
| <dd> | ||||
| Each client ID serializes CREATE_SESSION via a per-client ID | ||||
| sequence number (see | ||||
| <xref target="OP_CREATE_SESSION_IMPLEMENTATION" format="default"/>). | ||||
| The corresponding result is csr_sequence, which <bcp14>MUST</bcp14> be equal to | ||||
| csa_sequence. | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| In the next three arguments, the client offers a value | ||||
| that is to be a property of the session. Except where | ||||
| stated otherwise, it is <bcp14>RECOMMENDED</bcp14> that | ||||
| the server accept the value. | ||||
| If it is not acceptable, the server <bcp14>MAY</bcp14> use a different value. | ||||
| Regardless, the server <bcp14>MUST</bcp14> return the value the session will | ||||
| use (which will be either what the client offered, or what | ||||
| the server is insisting on) to the client. | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>csa_flags:</dt> | ||||
| <dd> | ||||
| <t> | ||||
| The csa_flags field contains a list of the following flag | ||||
| bits: | ||||
| </t> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>CREATE_SESSION4_FLAG_PERSIST:</dt> | ||||
| <dd> | ||||
| <t> | ||||
| If CREATE_SESSION4_FLAG_PERSIST is set, the client | ||||
| wants the server to provide a persistent reply cache. | ||||
| For sessions in which only idempotent operations | ||||
| will be used (e.g., a read-only session), clients | ||||
| <bcp14>SHOULD NOT</bcp14> set CREATE_SESSION4_FLAG_PERSIST. If | ||||
| the server does not or cannot provide a persistent reply cache, | ||||
| the server <bcp14>MUST NOT</bcp14> set CREATE_SESSION4_FLAG_PERSIST in | ||||
| the field csr_flags. | ||||
| </t> | ||||
| <t> | ||||
| If the server is a pNFS metadata server, for | ||||
| reasons described in <xref target="obtaining_layout" format="default"/> | ||||
| it <bcp14>SHOULD</bcp14> support CREATE_SESSION4_FLAG_PERSIST if it | ||||
| supports the layout_hint (<xref target="attrdef_layout_hint" format="default"/>) | ||||
| attribute. | ||||
| </t> | ||||
| </dd> | ||||
| <dt>CREATE_SESSION4_FLAG_CONN_BACK_CHAN:</dt> | ||||
| <dd> | ||||
| If CREATE_SESSION4_FLAG_CONN_BACK_CHAN is set in csa_flags, | ||||
| the client is requesting that the connection over which the | ||||
| CREATE_SESSION operation arrived be associated with the session's | ||||
| backchannel in addition to its fore channel. | ||||
| If the server agrees, it | ||||
| sets CREATE_SESSION4_FLAG_CONN_BACK_CHAN | ||||
| in the result field csr_flags. If | ||||
| CREATE_SESSION4_FLAG_CONN_BACK_CHAN is not set in csa_flags, | ||||
| then CREATE_SESSION4_FLAG_CONN_BACK_CHAN <bcp14>MUST NOT</bcp14> be set | ||||
| in csr_flags. | ||||
| </dd> | ||||
| <dt>CREATE_SESSION4_FLAG_CONN_RDMA:</dt> | ||||
| <dd> | ||||
| If CREATE_SESSION4_FLAG_CONN_RDMA is set in csa_flags, | ||||
| and if the connection over which the CREATE_SESSION operation | ||||
| arrived | ||||
| is currently in non-RDMA mode but | ||||
| has the capability to operate in RDMA mode, then the client | ||||
| is requesting that the server "step up" to RDMA mode | ||||
| on the connection. | ||||
| If the server agrees, it sets | ||||
| CREATE_SESSION4_FLAG_CONN_RDMA in the result | ||||
| field csr_flags. If CREATE_SESSION4_FLAG_CONN_RDMA is | ||||
| not set in csa_flags, then CREATE_SESSION4_FLAG_CONN_RDMA <bcp14>MUST | ||||
| NOT</bcp14> be set in csr_flags. | ||||
| Note that once the server agrees to step up, it and the client | ||||
| <bcp14>MUST</bcp14> exchange all future traffic on the connection with RPC RDMA | ||||
| framing and not Record Marking (<xref target="RFC8166" format="default"/>). | ||||
| </dd> | ||||
| </dl> | ||||
| </dd> | ||||
| <dt>csa_fore_chan_attrs, csa_fore_chan_attrs:</dt> | ||||
| <dd> | ||||
| <t> | ||||
| The csa_fore_chan_attrs and csa_back_chan_attrs | ||||
| fields apply to attributes of the | ||||
| fore channel (which conveys | ||||
| requests originating from the client to the server), | ||||
| and the backchannel (the channel that conveys | ||||
| callback requests originating from the | ||||
| server to the client), respectively. The results are in corresponding structures | ||||
| called csr_fore_chan_attrs and csr_back_chan_attrs. | ||||
| The results establish attributes for each channel, and | ||||
| on all subsequent use of each channel of the session. | ||||
| Each structure has the following fields: | ||||
| </t> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>ca_headerpadsize:</dt> | ||||
| <dd> | ||||
| <t> | ||||
| The maximum amount of padding the requester is willing to apply | ||||
| to ensure that write payloads are aligned on some boundary at | ||||
| the replier. For each channel, the server | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| will reply in ca_headerpadsize with | ||||
| its preferred value, | ||||
| or zero if padding is not in use, and | ||||
| </li> | ||||
| <li> | ||||
| <bcp14>MAY</bcp14> decrease this value but <bcp14>MUST NOT</bcp14> increase it. | ||||
| </li> | ||||
| </ul> | ||||
| </dd> | ||||
| <dt>ca_maxrequestsize:</dt> | ||||
| <dd> | ||||
| The maximum size of a COMPOUND or CB_COMPOUND request that | ||||
| will be sent. This size represents the XDR encoded size of | ||||
| the request, including the RPC headers (including | ||||
| security flavor credentials and verifiers) | ||||
| but excludes any RPC transport framing headers. | ||||
| Imagine a request coming over a non-RDMA TCP/IP connection, and | ||||
| that it has a single Record Marking header preceding | ||||
| it. The maximum allowable | ||||
| count encoded in the header will be | ||||
| ca_maxrequestsize. If a requester sends | ||||
| a request that exceeds ca_maxrequestsize, the error | ||||
| NFS4ERR_REQ_TOO_BIG will be returned per the description in | ||||
| <xref target="COMPOUND_Sizing_Issues" format="default"/>. | ||||
| For each channel, | ||||
| the server <bcp14>MAY</bcp14> decrease this value but <bcp14>MUST NOT</bcp14> increase it. | ||||
| </dd> | ||||
| <dt>ca_maxresponsesize:</dt> | ||||
| <dd> | ||||
| The maximum size of a COMPOUND or CB_COMPOUND reply that | ||||
| the requester will | ||||
| accept from the replier including RPC headers (see | ||||
| the ca_maxrequestsize definition). | ||||
| For each channel, the server <bcp14>MAY</bcp14> decrease this value, but <bcp14>MUST | ||||
| NOT</bcp14> increase it. | ||||
| However, if the client selects a value for | ||||
| ca_maxresponsesize such that a replier on a channel could | ||||
| never send a response, the server <bcp14>SHOULD</bcp14> return | ||||
| NFS4ERR_TOOSMALL in the CREATE_SESSION reply. | ||||
| After the session is created, if a requester sends a | ||||
| request for which the size of the reply would exceed | ||||
| this value, the replier will return NFS4ERR_REP_TOO_BIG, | ||||
| per the description in | ||||
| <xref target="COMPOUND_Sizing_Issues" format="default"/>. | ||||
| </dd> | ||||
| <dt>ca_maxresponsesize_cached:</dt> | ||||
| <dd> | ||||
| Like ca_maxresponsesize, but the maximum size of a reply | ||||
| that will be stored in the reply cache | ||||
| (<xref target="Slot_Identifiers_and_Server_Reply_Cache" format="default"/>). | ||||
| For each channel, the server <bcp14>MAY</bcp14> decrease this | ||||
| value, but <bcp14>MUST NOT</bcp14> increase it. | ||||
| If, in the reply to CREATE_SESSION, the value of | ||||
| ca_maxresponsesize_cached of a channel is less than the value | ||||
| of ca_maxresponsesize of the same channel, then this is an | ||||
| indication to the requester that it needs to be selective | ||||
| about which replies it directs the replier to cache; for | ||||
| example, large replies from non-idempotent operations (e.g., | ||||
| COMPOUND requests with a READ operation) should not be | ||||
| cached. The requester decides which replies to cache via an | ||||
| argument to the SEQUENCE (the sa_cachethis field, see <xref target="OP_SEQUENCE" format="default"/>) or CB_SEQUENCE (the csa_cachethis | ||||
| field, see <xref target="OP_CB_SEQUENCE" format="default"/>) operations. | ||||
| After the session is created, if a requester sends a | ||||
| request for which the size of the reply would exceed | ||||
| ca_maxresponsesize_cached, the replier will return | ||||
| NFS4ERR_REP_TOO_BIG_TO_CACHE, per the description in <xref target="COMPOUND_Sizing_Issues" format="default"/>. | ||||
| </dd> | ||||
| <dt>ca_maxoperations:</dt> | ||||
| <dd> | ||||
| The maximum number of operations the replier | ||||
| will accept in a COMPOUND or CB_COMPOUND. | ||||
| For the backchannel, the server <bcp14>MUST NOT</bcp14> change the value the | ||||
| client offers. For the fore channel, the server | ||||
| <bcp14>MAY</bcp14> change the requested value. | ||||
| After the session is created, if a requester sends a | ||||
| COMPOUND or CB_COMPOUND | ||||
| with more operations than ca_maxoperations, | ||||
| the replier <bcp14>MUST</bcp14> return NFS4ERR_TOO_MANY_OPS. | ||||
| </dd> | ||||
| <dt>ca_maxrequests:</dt> | ||||
| <dd> | ||||
| The maximum number of concurrent COMPOUND or CB_COMPOUND | ||||
| requests the requester will send on the session. Subsequent | ||||
| requests will each be assigned a slot identifier by the requester | ||||
| within the range zero to ca_maxrequests - 1 inclusive. | ||||
| For the backchannel, the server <bcp14>MUST NOT</bcp14> change the value the | ||||
| client offers. For the fore channel, the server | ||||
| <bcp14>MAY</bcp14> change the requested value. | ||||
| </dd> | ||||
| <dt>ca_rdma_ird:</dt> | ||||
| <dd> | ||||
| This array has a maximum of one element. | ||||
| If this array has one element, then the element contains the | ||||
| inbound RDMA read queue depth (IRD). | ||||
| For each channel, the server <bcp14>MAY</bcp14> decrease this value, but <bcp14>MUST | ||||
| NOT</bcp14> increase it. | ||||
| </dd></dl></dd> | ||||
| <dt>csa_cb_program</dt> | ||||
| <dd> | ||||
| This is the ONC RPC program number the server <bcp14>MUST</bcp14> use in | ||||
| any callbacks sent through the backchannel to the client. | ||||
| The server <bcp14>MUST</bcp14> specify an ONC RPC program number equal to | ||||
| csa_cb_program and an ONC RPC version number equal to 4 in | ||||
| callbacks sent to the client. If a CB_COMPOUND is | ||||
| sent to the client, the server <bcp14>MUST</bcp14> use a minor version | ||||
| number of 1. | ||||
| There is no corresponding result. | ||||
| </dd> | ||||
| <dt>csa_sec_parms</dt> | ||||
| <dd> | ||||
| <t> | ||||
| The field csa_sec_parms is an array of acceptable | ||||
| security credentials the server can use on | ||||
| the session's backchannel. Three security | ||||
| flavors are supported: AUTH_NONE, AUTH_SYS, | ||||
| and RPCSEC_GSS. If AUTH_NONE is specified for | ||||
| a credential, then this says the client is | ||||
| authorizing the server to use AUTH_NONE on | ||||
| all callbacks for the session. If AUTH_SYS | ||||
| is specified, then the client is authorizing | ||||
| the server to use AUTH_SYS on all callbacks, | ||||
| using the credential specified cbsp_sys_cred. If | ||||
| RPCSEC_GSS is specified, then the server is | ||||
| allowed to use the RPCSEC_GSS context specified | ||||
| in cbsp_gss_parms as the RPCSEC_GSS context in | ||||
| the credential of the RPC header of callbacks | ||||
| to the client. | ||||
| There is no corresponding result. | ||||
| </t> | ||||
| <t> | ||||
| The RPCSEC_GSS context for the backchannel is specified via | ||||
| a pair of values of data type | ||||
| gsshandle4_t. The data type gsshandle4_t represents an | ||||
| RPCSEC_GSS handle, and is | ||||
| precisely the same as the data type of the "handle" field of | ||||
| the rpc_gss_init_res data type defined in "Context Creation Response | ||||
| - Successful Acceptance", <xref target="RFC2203" sectionFormat="of" section="5.2.3.1"/>. | ||||
| </t> | ||||
| <t> | ||||
| The first RPCSEC_GSS handle, gcbp_handle_from_server, | ||||
| is the fore handle the server returned to | ||||
| the client (either in the handle field of data type | ||||
| rpc_gss_init_res or as one of the elements of the spi_handles | ||||
| field returned in the reply to EXCHANGE_ID) when the RPCSEC_GSS context | ||||
| was created on the server. The second handle, | ||||
| gcbp_handle_from_client, is the back handle to which the | ||||
| client will map the RPCSEC_GSS context. The | ||||
| server can immediately use the value of | ||||
| gcbp_handle_from_client in the RPCSEC_GSS credential | ||||
| in callback RPCs. That is, the value in | ||||
| gcbp_handle_from_client can be used as the | ||||
| value of the field "handle" in data type | ||||
| rpc_gss_cred_t (see "Elements of | ||||
| the RPCSEC_GSS Security Protocol", <xref target="RFC2203" sectionFormat="of" section="5"/>) in callback RPCs. | ||||
| The server <bcp14>MUST</bcp14> use the RPCSEC_GSS security service | ||||
| specified in gcbp_service, i.e., it <bcp14>MUST</bcp14> set the | ||||
| "service" field of the rpc_gss_cred_t data type in | ||||
| RPCSEC_GSS credential to the value of gcbp_service (see | ||||
| "RPC Request Header", <xref target="RFC2203" sectionFormat="of" section="5.3.1"/>). | ||||
| </t> | ||||
| <t> | ||||
| If the RPCSEC_GSS handle identified by | ||||
| gcbp_handle_from_server does not exist on the server, | ||||
| the server will return NFS4ERR_NOENT. | ||||
| </t> | ||||
| <t> | ||||
| Within each element of csa_sec_parms, the fore and back RPCSEC_GSS contexts <bcp14>MUST</bcp14> | ||||
| share the same GSS context | ||||
| and <bcp14>MUST</bcp14> have the same seq_window | ||||
| (see Section <xref target="RFC2203" sectionFormat="bare" section="5.2.3.1"/> | ||||
| of RFC 2203 <xref target="RFC2203" format="default"/>). | ||||
| The fore and back RPCSEC_GSS context state | ||||
| are independent of each other as far as the | ||||
| RPCSEC_GSS sequence number (see the seq_num | ||||
| field in the rpc_gss_cred_t data type of Sections | ||||
| <xref target="RFC2203" sectionFormat="bare" section="5"/> and | ||||
| <xref target="RFC2203" sectionFormat="bare" section="5.3.1"/> of | ||||
| <xref target="RFC2203" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| If an RPCSEC_GSS handle is using the SSV context (see <xref target="ssv_mech" format="default"/>), then because each SSV RPCSEC_GSS | ||||
| handle shares a common SSV GSS context, there are security | ||||
| considerations specific to this situation discussed in <xref target="rpcsec_ssv_consider" format="default"/>. | ||||
| </t> | ||||
| </dd> | ||||
| </dl> | ||||
| <!-- [auth] sg check --> | ||||
| <t> | ||||
| Once the session is created, the first SEQUENCE or | ||||
| CB_SEQUENCE received on a slot <bcp14>MUST</bcp14> have a sequence | ||||
| ID equal to 1; if not, the replier <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_SEQ_MISORDERED. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CREATE_SESSION_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| To describe a possible implementation, the same notation for client | ||||
| records introduced in the description of EXCHANGE_ID is used | ||||
| with the following addition: | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| clientid_arg: | ||||
| The value of the csa_clientid field of the CREATE_SESSION4args | ||||
| structure of the current request. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Since CREATE_SESSION is a non-idempotent operation, we | ||||
| need to consider the possibility that retries may occur | ||||
| as a result of a client restart, network partition, | ||||
| malfunctioning router, etc. For each client ID | ||||
| created by EXCHANGE_ID, the server maintains a | ||||
| separate reply cache (called the CREATE_SESSION reply cache) | ||||
| similar to the session reply | ||||
| cache used for SEQUENCE operations, with two | ||||
| distinctions. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| First, this is a reply cache just for | ||||
| detecting and processing CREATE_SESSION requests for a | ||||
| given client ID. | ||||
| </li> | ||||
| <li> | ||||
| Second, the size of the client ID | ||||
| reply cache is of one slot (and as a result, the | ||||
| CREATE_SESSION request does not carry a slot number). | ||||
| This means that at most one CREATE_SESSION request for | ||||
| a given client ID can be outstanding. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| As previously stated, CREATE_SESSION can be sent with | ||||
| or without a preceding SEQUENCE operation. Even if a | ||||
| SEQUENCE precedes CREATE_SESSION, the server <bcp14>MUST</bcp14> | ||||
| maintain the CREATE_SESSION reply cache, which | ||||
| is separate from the reply cache for the session | ||||
| associated with a SEQUENCE. If CREATE_SESSION was | ||||
| originally sent by itself, the client <bcp14>MAY</bcp14> send | ||||
| a retry of the CREATE_SESSION operation within a | ||||
| COMPOUND preceded by a SEQUENCE. If CREATE_SESSION | ||||
| was originally sent in a COMPOUND that started with a | ||||
| SEQUENCE, then the client <bcp14>SHOULD</bcp14> send a retry in | ||||
| a COMPOUND that starts with a SEQUENCE that has the | ||||
| same session ID as the SEQUENCE of the original | ||||
| request. However, the client <bcp14>MAY</bcp14> send a retry in a | ||||
| COMPOUND that either has no preceding SEQUENCE, or | ||||
| has a preceding SEQUENCE that refers to a different | ||||
| session than the original CREATE_SESSION. This might | ||||
| be necessary if the client sends a CREATE_SESSION | ||||
| in a COMPOUND preceded by a SEQUENCE with session | ||||
| ID X, and session X no longer exists. Regardless, any | ||||
| retry of CREATE_SESSION, with or without a preceding | ||||
| SEQUENCE, <bcp14>MUST</bcp14> use the same value of csa_sequence | ||||
| as the original. | ||||
| </t> | ||||
| <t> | ||||
| After the client received a reply to an EXCHANGE_ID operation that contains | ||||
| a new, unconfirmed client ID, | ||||
| the server expects the client to follow | ||||
| with a CREATE_SESSION operation to confirm the client ID. The | ||||
| server expects value of csa_sequenceid in the arguments to | ||||
| that CREATE_SESSION to be | ||||
| to equal the value of the field eir_sequenceid that was returned in | ||||
| results of the EXCHANGE_ID that returned the unconfirmed | ||||
| client ID. | ||||
| Before the server replies to that EXCHANGE_ID operation, | ||||
| it initializes the client ID slot to be equal | ||||
| to eir_sequenceid - 1 (accounting for underflow), | ||||
| and records a contrived CREATE_SESSION result | ||||
| with a "cached" result of NFS4ERR_SEQ_MISORDERED. | ||||
| With the client ID slot thus initialized, the processing of the | ||||
| CREATE_SESSION operation is divided into four phases: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| Client record look up. The server looks up the client ID | ||||
| in its client record table. | ||||
| If the server contains no records | ||||
| with client ID equal to clientid_arg, then most | ||||
| likely the client's state has been purged during a | ||||
| period of inactivity, possibly due to a loss of | ||||
| connectivity. NFS4ERR_STALE_CLIENTID is returned, | ||||
| and no changes are made to any client records on | ||||
| the server. Otherwise, the server goes to phase 2. | ||||
| </li> | ||||
| <li> | ||||
| Sequence ID processing. If csa_sequenceid is equal to the | ||||
| sequence ID in the client ID's slot, then this is a replay | ||||
| of the previous CREATE_SESSION request, and the server | ||||
| returns the cached result. | ||||
| If csa_sequenceid is not equal to the sequence ID in the slot, | ||||
| and is more than one greater (accounting for wraparound), | ||||
| then the server returns the error NFS4ERR_SEQ_MISORDERED, | ||||
| and does not change the slot. If csa_sequenceid is | ||||
| equal to the slot's sequence ID + 1 (accounting for | ||||
| wraparound), then the slot's sequence ID is set to | ||||
| csa_sequenceid, and the CREATE_SESSION processing goes to | ||||
| the next phase. A subsequent new CREATE_SESSION call | ||||
| over the same client ID <bcp14>MUST</bcp14> | ||||
| use a csa_sequenceid that is one greater than the | ||||
| sequence ID in the slot. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Client ID confirmation. If this would be the first session for the | ||||
| client ID, the CREATE_SESSION operation serves to confirm the | ||||
| client ID. | ||||
| Otherwise, | ||||
| the client ID confirmation phase is skipped and only | ||||
| the session creation phase occurs. | ||||
| Any case in which there is more than one | ||||
| record with identical values for client ID represents | ||||
| a server implementation error. | ||||
| Operation in the | ||||
| potential valid cases is summarized as follows. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t>Successful Confirmation | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| If the server has the following unconfirmed record, then this | ||||
| is the expected confirmation of an unconfirmed record. | ||||
| </li> | ||||
| <li> | ||||
| { ownerid, verifier, principal_arg, clientid_arg, unconfirmed } | ||||
| </li> | ||||
| <li> | ||||
| As noted in <xref target="OP_EXCHANGE_ID_IMPLEMENTATION" format="default"/>, | ||||
| the server might also have the following confirmed record. | ||||
| </li> | ||||
| <li> | ||||
| { ownerid, old_verifier, principal_arg, old_clientid, confirmed } | ||||
| </li> | ||||
| <li> | ||||
| The server schedules the replacement of both records with: | ||||
| </li> | ||||
| <li> | ||||
| { ownerid, verifier, principal_arg, clientid_arg, confirmed } | ||||
| </li> | ||||
| <li> | ||||
| The processing of CREATE_SESSION continues on to session creation. | ||||
| Once the session is successfully created, the scheduled client | ||||
| record replacement is committed. If the session is not | ||||
| successfully created, then no changes are made to any client | ||||
| records on the server. | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| <t>Unsuccessful Confirmation | ||||
| </t> | ||||
| <ul empty="true" spacing="normal"> | ||||
| <li> | ||||
| If the server has the following record, then the client has | ||||
| changed principals after the previous EXCHANGE_ID request, | ||||
| or there has been a chance collision between shorthand client | ||||
| identifiers. | ||||
| </li> | ||||
| <li> | ||||
| { *, *, old_principal_arg, clientid_arg, * } | ||||
| </li> | ||||
| <li> | ||||
| Neither of these cases is permissible. Processing stops and | ||||
| NFS4ERR_CLID_INUSE is returned to the client. No changes are | ||||
| made to any client records on the server. | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Session creation. | ||||
| The server confirmed the client ID, either in this | ||||
| CREATE_SESSION operation, or a previous CREATE_SESSION | ||||
| operation. | ||||
| The server examines the remaining fields of the arguments. | ||||
| </t> | ||||
| <t> | ||||
| The server creates the session by recording the | ||||
| parameter values used (including whether the | ||||
| CREATE_SESSION4_FLAG_PERSIST flag is set and has | ||||
| been accepted by the server) and allocating space | ||||
| for the session reply cache (if there is not enough | ||||
| space, the server returns NFS4ERR_NOSPC). For each slot in the | ||||
| reply cache, the server sets the sequence ID to zero, | ||||
| and records an entry containing a COMPOUND | ||||
| reply with zero operations and the error | ||||
| NFS4ERR_SEQ_MISORDERED. This way, if the first | ||||
| SEQUENCE request sent has a sequence ID equal to | ||||
| zero, the server can simply return what is in the | ||||
| reply cache: NFS4ERR_SEQ_MISORDERED. The client | ||||
| initializes its reply cache for receiving callbacks | ||||
| in the same way, and similarly, the first CB_SEQUENCE | ||||
| operation on a slot after session creation <bcp14>MUST</bcp14> have | ||||
| a sequence ID of one. | ||||
| </t> | ||||
| <t> | ||||
| If the session state is created successfully, the server associates | ||||
| the session with the client ID provided by the client. | ||||
| </t> | ||||
| <t> | ||||
| When a request that had CREATE_SESSION4_FLAG_CONN_RDMA set | ||||
| needs to be retried, the retry | ||||
| <bcp14>MUST</bcp14> be done on a new connection that is in non-RDMA mode. | ||||
| If properties of the new connection are different enough | ||||
| that the arguments to CREATE_SESSION need to change, then | ||||
| a non-retry <bcp14>MUST</bcp14> be sent. The server will eventually dispose | ||||
| of any session that was created on the original connection. | ||||
| </t> | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| On the backchannel, the client and server might wish to | ||||
| have many slots, in some cases perhaps more that the fore channel, in | ||||
| order to deal with the situations where the | ||||
| network link has high latency and is the primary | ||||
| bottleneck for response to recalls. If so, and if the | ||||
| client provides too few slots to the backchannel, | ||||
| the server might limit the number of recallable | ||||
| objects it gives to the client. | ||||
| </t> | ||||
| <t> | ||||
| Implementing RPCSEC_GSS callback support requires | ||||
| changes to both the client and server implementations of | ||||
| RPCSEC_GSS. One possible set of changes includes: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Adding a data structure that wraps the GSS-API | ||||
| context with a reference count. | ||||
| </li> | ||||
| <li> | ||||
| New functions to increment and decrement the reference | ||||
| count. If the reference count is decremented to zero, | ||||
| the wrapper data structure and the GSS-API context it | ||||
| refers to would be freed. | ||||
| </li> | ||||
| <li> | ||||
| Change RPCSEC_GSS to create the wrapper data | ||||
| structure upon receiving GSS-API context from | ||||
| gss_accept_sec_context() and gss_init_sec_context(). | ||||
| The reference count would be initialized to 1. | ||||
| </li> | ||||
| <li> | ||||
| Adding a function to map an existing | ||||
| RPCSEC_GSS handle to a pointer to the wrapper data | ||||
| structure. The reference count would be incremented. | ||||
| </li> | ||||
| <li> | ||||
| Adding a function to create a new RPCSEC_GSS | ||||
| handle from a pointer to the wrapper data structure. | ||||
| The reference count would be incremented. | ||||
| </li> | ||||
| <li> | ||||
| Replacing calls from RPCSEC_GSS that free GSS-API | ||||
| contexts, with calls to decrement the reference count | ||||
| on the wrapper data structure. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_DESTROY_SESSION" numbered="true" toc="default"> | ||||
| <name>Operation 44: DESTROY_SESSION - Destroy a Session</name> | ||||
| <section toc="exclude" anchor="OP_DESTROY_SESSION_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct DESTROY_SESSION4args { | ||||
| sessionid4 dsa_sessionid; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_DESTROY_SESSION_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct DESTROY_SESSION4res { | ||||
| nfsstat4 dsr_status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_DESTROY_SESSION_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The DESTROY_SESSION operation closes the session and discards | ||||
| the session's reply cache, if any. | ||||
| Any remaining connections associated with the session are | ||||
| immediately disassociated. If the connection has no remaining | ||||
| associated sessions, the connection | ||||
| <bcp14>MAY</bcp14> be closed by the server. | ||||
| Locks, delegations, layouts, wants, and the lease, which are all | ||||
| tied to the client ID, are not affected by DESTROY_SESSION. | ||||
| </t> | ||||
| <t> | ||||
| DESTROY_SESSION <bcp14>MUST</bcp14> be invoked on a connection that | ||||
| is associated with the session being destroyed. | ||||
| In addition, if SP4_MACH_CRED state protection | ||||
| was specified when the client ID was created, | ||||
| the RPCSEC_GSS principal that created the session <bcp14>MUST</bcp14> be | ||||
| the one that destroys the session, using RPCSEC_GSS | ||||
| privacy or integrity. If SP4_SSV state protection was | ||||
| specified when the client ID was created, RPCSEC_GSS | ||||
| using the SSV mechanism (<xref target="ssv_mech" format="default"/>) | ||||
| <bcp14>MUST</bcp14> be used, with integrity or privacy. | ||||
| </t> | ||||
| <t> | ||||
| If the COMPOUND request starts with SEQUENCE, and | ||||
| if the sessionids specified in SEQUENCE and DESTROY_SESSION | ||||
| are the same, then | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| DESTROY_SESSION <bcp14>MUST</bcp14> be the final operation in the COMPOUND | ||||
| request. | ||||
| </li> | ||||
| <li> | ||||
| It is advisable to avoid placing DESTROY_SESSION in a | ||||
| COMPOUND request with other state-modifying | ||||
| operations, because the DESTROY_SESSION will destroy | ||||
| the reply cache. | ||||
| </li> | ||||
| <li> | ||||
| Because the session and its reply cache are destroyed, a client that | ||||
| retries the request may receive an error in | ||||
| reply to the retry, even though the original request was | ||||
| successful. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If the COMPOUND request starts with SEQUENCE, and | ||||
| if the sessionids specified in SEQUENCE and DESTROY_SESSION | ||||
| are different, then DESTROY_SESSION can appear in any position | ||||
| of the COMPOUND request (except for the first position). The | ||||
| two sessionids can belong to different client IDs. | ||||
| </t> | ||||
| <t> | ||||
| If the COMPOUND request does not start with | ||||
| SEQUENCE, and if DESTROY_SESSION is not the | ||||
| sole operation, then server <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_NOT_ONLY_OP. | ||||
| </t> | ||||
| <t> | ||||
| If there is a backchannel on the session and the | ||||
| server has outstanding CB_COMPOUND operations for the | ||||
| session which have not been replied to, then the server | ||||
| <bcp14>MAY</bcp14> refuse to destroy the session and return an error. | ||||
| If so, then | ||||
| in the event the backchannel is down, the server | ||||
| <bcp14>SHOULD</bcp14> return NFS4ERR_CB_PATH_DOWN to inform the | ||||
| client that the backchannel needs to be repaired before | ||||
| the server will allow the session to be destroyed. | ||||
| Otherwise, the error CB_BACK_CHAN_BUSY <bcp14>SHOULD</bcp14> be | ||||
| returned to indicate that there are CB_COMPOUNDs | ||||
| that need to be replied to. The client <bcp14>SHOULD</bcp14> reply | ||||
| to all outstanding CB_COMPOUNDs before re-sending | ||||
| DESTROY_SESSION. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_FREE_STATEID" numbered="true" toc="default"> | ||||
| <name>Operation 45: FREE_STATEID - Free Stateid with No Locks</name> | ||||
| <section toc="exclude" anchor="OP_FREE_STATEID_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct FREE_STATEID4args { | ||||
| stateid4 fsa_stateid; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_FREE_STATID_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct FREE_STATEID4res { | ||||
| nfsstat4 fsr_status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_FREE_STATEID4_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The FREE_STATEID operation is used to free a stateid that no longer | ||||
| has any associated locks (including opens, byte-range locks, delegations, | ||||
| and layouts). This may be because of client LOCKU operations or because | ||||
| of server revocation. If there are valid locks (of any kind) | ||||
| associated with the stateid in question, the error NFS4ERR_LOCKS_HELD | ||||
| will be returned, and the associated stateid will not be freed. | ||||
| </t> | ||||
| <t> | ||||
| When a stateid is freed that had been associated with revoked locks, | ||||
| by sending the FREE_STATEID operation, the client acknowledges the loss of those | ||||
| locks. This allows the server, once all such revoked state is | ||||
| acknowledged, | ||||
| to allow that client again to reclaim locks, without encountering | ||||
| the edge conditions discussed in <xref target="server_failure" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| Once a successful FREE_STATEID is done for a given stateid, any | ||||
| subsequent use of that stateid will result in an NFS4ERR_BAD_STATEID | ||||
| error. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_GET_DIR_DELEGATION" numbered="true" toc="default"> | ||||
| <name>Operation 46: GET_DIR_DELEGATION - Get a Directory Delegation</name> | ||||
| <section toc="exclude" anchor="OP_GET_DIR_DELEGATION_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| typedef nfstime4 attr_notice4; | ||||
| struct GET_DIR_DELEGATION4args { | ||||
| /* CURRENT_FH: delegated directory */ | ||||
| bool gdda_signal_deleg_avail; | ||||
| bitmap4 gdda_notification_types; | ||||
| attr_notice4 gdda_child_attr_delay; | ||||
| attr_notice4 gdda_dir_attr_delay; | ||||
| bitmap4 gdda_child_attributes; | ||||
| bitmap4 gdda_dir_attributes; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GET_DIR_DELEGATION_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct GET_DIR_DELEGATION4resok { | ||||
| verifier4 gddr_cookieverf; | ||||
| /* Stateid for get_dir_delegation */ | ||||
| stateid4 gddr_stateid; | ||||
| /* Which notifications can the server support */ | ||||
| bitmap4 gddr_notification; | ||||
| bitmap4 gddr_child_attributes; | ||||
| bitmap4 gddr_dir_attributes; | ||||
| }; | ||||
| enum gddrnf4_status { | ||||
| GDD4_OK = 0, | ||||
| GDD4_UNAVAIL = 1 | ||||
| }; | ||||
| union GET_DIR_DELEGATION4res_non_fatal | ||||
| switch (gddrnf4_status gddrnf_status) { | ||||
| case GDD4_OK: | ||||
| GET_DIR_DELEGATION4resok gddrnf_resok4; | ||||
| case GDD4_UNAVAIL: | ||||
| bool gddrnf_will_signal_deleg_avail; | ||||
| }; | ||||
| union GET_DIR_DELEGATION4res | ||||
| switch (nfsstat4 gddr_status) { | ||||
| case NFS4_OK: | ||||
| GET_DIR_DELEGATION4res_non_fatal gddr_res_non_fatal4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GET_DIR_DELEGATION_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The GET_DIR_DELEGATION operation is used by a client to request | ||||
| a directory delegation. The directory is represented by the | ||||
| current filehandle. The client also specifies whether it wants | ||||
| the server to notify it when the directory changes in certain | ||||
| ways by setting one or more bits in a bitmap. The server may | ||||
| refuse to grant the delegation. In that case, the server | ||||
| will return NFS4ERR_DIRDELEG_UNAVAIL. If the server decides to | ||||
| hand out the delegation, it will return a cookie verifier for | ||||
| that directory. If the cookie verifier changes when the client | ||||
| is holding the delegation, the delegation will be recalled | ||||
| unless the client has asked for notification for this event. | ||||
| </t> | ||||
| <t> | ||||
| The server will also return a directory delegation stateid, | ||||
| gddr_stateid, as a result of the | ||||
| GET_DIR_DELEGATION operation. This stateid will appear in | ||||
| callback messages related to the delegation, such as | ||||
| notifications and delegation recalls. The client will use this | ||||
| stateid to return the delegation voluntarily or upon recall. A | ||||
| delegation is returned by calling the DELEGRETURN operation. | ||||
| </t> | ||||
| <t> | ||||
| The server might not be able to support notifications of certain | ||||
| events. If the client asks for such notifications, the server | ||||
| <bcp14>MUST</bcp14> inform the client of its inability to do so as part of the | ||||
| GET_DIR_DELEGATION reply by not setting the appropriate bits in | ||||
| the supported notifications bitmask, gddr_notification, contained | ||||
| in the reply. The server <bcp14>MUST NOT</bcp14> add bits to gddr_notification | ||||
| that the client did not request. | ||||
| </t> | ||||
| <t> | ||||
| The GET_DIR_DELEGATION operation can be used for both normal and | ||||
| named attribute directories. | ||||
| </t> | ||||
| <t> | ||||
| If client sets gdda_signal_deleg_avail to TRUE, then it is | ||||
| registering with the client a "want" for a directory | ||||
| delegation. If the delegation is not available, and the server | ||||
| supports and will honor the "want", | ||||
| the results will have gddrnf_will_signal_deleg_avail set to TRUE | ||||
| and no error will be indicated on return. | ||||
| If so, the client should expect a future CB_RECALLABLE_OBJ_AVAIL | ||||
| operation to indicate that a directory delegation is available. | ||||
| If the server does not wish to honor the "want" or is not able | ||||
| to do so, it returns the error NFS4ERR_DIRDELEG_UNAVAIL. If the | ||||
| delegation is immediately available, the server <bcp14>SHOULD</bcp14> return it with | ||||
| the response to the operation, rather than via a callback. | ||||
| </t> | ||||
| <t> | ||||
| When a client makes a request for a | ||||
| directory delegation while it already holds | ||||
| a directory delegation for that directory | ||||
| (including the case where it has been | ||||
| recalled but not yet returned by the client | ||||
| or revoked by the server), the server <bcp14>MUST</bcp14> | ||||
| reply with the value of gddr_status set to | ||||
| NFS4_OK, the value of gddrnf_status set to | ||||
| GDD4_UNAVAIL, and the value of | ||||
| gddrnf_will_signal_deleg_avail set to | ||||
| FALSE. The delegation the client held | ||||
| before the request remains intact, and its | ||||
| state is unchanged. The current stateid is | ||||
| not changed (see <xref target="current_stateid" format="default"/> for a description | ||||
| of the current stateid). | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GET_DIR_DELEGATION_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| Directory delegations provide the benefit of improving cache | ||||
| consistency of namespace information. This is done through | ||||
| synchronous callbacks. A server must support synchronous | ||||
| callbacks in order to support directory delegations. In addition | ||||
| to that, asynchronous notifications provide a way to reduce | ||||
| network traffic as well as improve client performance in certain | ||||
| conditions. | ||||
| </t> | ||||
| <t> | ||||
| Notifications are specified in terms of potential | ||||
| changes to the directory. A client can ask to be | ||||
| notified of events by setting one or more | ||||
| bits in gdda_notification_types. | ||||
| The client can ask for notifications on addition of entries | ||||
| to a directory (by setting the | ||||
| NOTIFY4_ADD_ENTRY in gdda_notification_types), | ||||
| notifications on entry removal | ||||
| (NOTIFY4_REMOVE_ENTRY), renames | ||||
| (NOTIFY4_RENAME_ENTRY), directory attribute | ||||
| changes (NOTIFY4_CHANGE_DIR_ATTRIBUTES), | ||||
| and cookie verifier changes | ||||
| (NOTIFY4_CHANGE_COOKIE_VERIFIER) by setting | ||||
| one or more corresponding bits in the | ||||
| gdda_notification_types field. | ||||
| </t> | ||||
| <t> | ||||
| The client can also ask for | ||||
| notifications of changes to | ||||
| attributes of directory entries | ||||
| (NOTIFY4_CHANGE_CHILD_ATTRIBUTES) | ||||
| in order to keep its attribute cache up to date. However, any | ||||
| changes made to child attributes do not cause the delegation to | ||||
| be recalled. If a client is interested in directory entry | ||||
| caching or negative name caching, it can set the | ||||
| gdda_notification_types appropriately to its particular need | ||||
| and the server will notify it of | ||||
| all changes that would otherwise invalidate its name cache. The | ||||
| kind of notification a client asks for may depend on the | ||||
| directory size, its rate of change, and the applications being | ||||
| used to access that directory. The enumeration of the conditions under | ||||
| which a client might ask for a notification is out of the scope | ||||
| of this specification. | ||||
| </t> | ||||
| <t> | ||||
| For attribute notifications, the client | ||||
| will set bits in the gdda_dir_attributes | ||||
| bitmap to indicate which attributes | ||||
| it wants to be notified of. If the server does not support | ||||
| notifications for changes to a certain attribute, it <bcp14>SHOULD NOT</bcp14> | ||||
| set that attribute in the supported attribute bitmap | ||||
| specified in the reply (gddr_dir_attributes). The client will | ||||
| also set in the gdda_child_attributes bitmap the attributes | ||||
| of directory entries it wants to be notified of, and | ||||
| the server will indicate in gddr_child_attributes which | ||||
| attributes of directory entries it will notify the client of. | ||||
| </t> | ||||
| <t> | ||||
| The client will also let the server know if | ||||
| it wants to get the notification as soon as the attribute change | ||||
| occurs or after a certain delay by setting a delay factor; | ||||
| gdda_child_attr_delay is for attribute changes to directory entries and | ||||
| gdda_dir_attr_delay is for attribute changes to the directory. If this | ||||
| delay factor is set to zero, that indicates to the server that | ||||
| the client wants to be notified of any attribute changes as soon | ||||
| as they occur. If the delay factor is set to N seconds, the server will | ||||
| make a best-effort guarantee that attribute updates are | ||||
| synchronized within N seconds. | ||||
| If the client asks | ||||
| for a delay factor that the server does not support or that may | ||||
| cause significant resource consumption on the server by causing | ||||
| the server to send a lot of notifications, the server should not | ||||
| commit to sending out notifications for attributes and | ||||
| therefore must not set the appropriate bit in the | ||||
| gddr_child_attributes and gddr_dir_attributes bitmaps in the response. | ||||
| </t> | ||||
| <t> | ||||
| The client <bcp14>MUST</bcp14> use a security tuple (<xref target="NFSv4_Security_Tuples" format="default"/>) that the | ||||
| directory or its applicable ancestor (<xref target="Security_Service_Negotiation" format="default"/>) is | ||||
| exported with. If not, the server <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_WRONGSEC to the operation that both precedes | ||||
| GET_DIR_DELEGATION and sets the current filehandle | ||||
| (see <xref target="using_secinfo" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| The directory delegation covers all the entries in the | ||||
| directory except the parent entry. That means if a directory and | ||||
| its parent both hold directory delegations, any changes to the | ||||
| parent will not cause a notification to be sent for the child | ||||
| even though the child's parent entry points to the parent | ||||
| directory. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_GETDEVICEINFO" numbered="true" toc="default"> | ||||
| <name>Operation 47: GETDEVICEINFO - Get Device Information</name> | ||||
| <section toc="exclude" anchor="OP_GETDEVICEINFO_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct GETDEVICEINFO4args { | ||||
| deviceid4 gdia_device_id; | ||||
| layouttype4 gdia_layout_type; | ||||
| count4 gdia_maxcount; | ||||
| bitmap4 gdia_notify_types; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GETDEVICEINFO_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct GETDEVICEINFO4resok { | ||||
| device_addr4 gdir_device_addr; | ||||
| bitmap4 gdir_notification; | ||||
| }; | ||||
| union GETDEVICEINFO4res switch (nfsstat4 gdir_status) { | ||||
| case NFS4_OK: | ||||
| GETDEVICEINFO4resok gdir_resok4; | ||||
| case NFS4ERR_TOOSMALL: | ||||
| count4 gdir_mincount; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GETDEVICEINFO_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The GETDEVICEINFO operation returns pNFS storage device address | ||||
| information for the specified device ID. | ||||
| The client identifies the device information to be returned by | ||||
| providing the gdia_device_id and gdia_layout_type that uniquely | ||||
| identify the device. The client provides gdia_maxcount | ||||
| to limit the number of bytes for the result. This maximum size | ||||
| represents all of the data being returned within the | ||||
| GETDEVICEINFO4resok structure and includes the XDR overhead. | ||||
| The server may return less data. If the server is unable to | ||||
| return any information within the gdia_maxcount limit, the error | ||||
| NFS4ERR_TOOSMALL will be returned. However, if gdia_maxcount is | ||||
| zero, NFS4ERR_TOOSMALL <bcp14>MUST NOT</bcp14> be returned. | ||||
| </t> | ||||
| <t> | ||||
| The da_layout_type field of the gdir_device_addr returned | ||||
| by the server <bcp14>MUST</bcp14> be equal to the gdia_layout_type specified | ||||
| by the client. If it is not equal, the client <bcp14>SHOULD</bcp14> ignore | ||||
| the response as invalid and behave as if the server returned | ||||
| an error, even if the client does have support for the | ||||
| layout type returned. | ||||
| </t> | ||||
| <t> | ||||
| The client also provides a notification bitmap, | ||||
| gdia_notify_types, for the device ID mapping | ||||
| notification for which it is interested in receiving; | ||||
| the server must support device ID notifications | ||||
| for the notification request to have affect. | ||||
| The notification mask is composed in the same | ||||
| manner as the bitmap for file attributes (<xref target="fattr4" format="default"/>). The numbers of bit positions | ||||
| are listed in the notify_device_type4 enumeration type | ||||
| (<xref target="OP_CB_NOTIFY_DEVICEID" format="default"/>). Only | ||||
| two enumerated values of notify_device_type4 currently | ||||
| apply to GETDEVICEINFO: | ||||
| NOTIFY_DEVICEID4_CHANGE | ||||
| and NOTIFY_DEVICEID4_DELETE (see <xref target="OP_CB_NOTIFY_DEVICEID" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| The notification bitmap applies only to the specified device ID. | ||||
| If a client sends a GETDEVICEINFO operation on a deviceID multiple times, | ||||
| the last notification bitmap is used by the server for | ||||
| subsequent notifications. If the bitmap is zero or empty, | ||||
| then the device ID's notifications are turned off. | ||||
| </t> | ||||
| <t> | ||||
| If the client wants to just update or turn off notifications, | ||||
| it <bcp14>MAY</bcp14> send a GETDEVICEINFO operation with gdia_maxcount set to zero. | ||||
| In that event, if the device ID is valid, the reply's da_addr_body | ||||
| field of the gdir_device_addr field will be of zero length. | ||||
| </t> | ||||
| <t> | ||||
| If an unknown device ID is given in gdia_device_id, | ||||
| the server returns NFS4ERR_NOENT. | ||||
| Otherwise, the device address | ||||
| information is returned in gdir_device_addr. | ||||
| Finally, if the server supports | ||||
| notifications for device ID mappings, the gdir_notification | ||||
| result will contain a bitmap of which notifications | ||||
| it will actually send to the client (via CB_NOTIFY_DEVICEID, | ||||
| see <xref target="OP_CB_NOTIFY_DEVICEID" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| If NFS4ERR_TOOSMALL is returned, the results also contain | ||||
| gdir_mincount. The value of gdir_mincount represents the | ||||
| minimum size necessary to obtain the device information. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GETDEVICEINFO_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| Aside from updating or turning off notifications, another | ||||
| use case for gdia_maxcount being set to zero is to validate | ||||
| a device ID. | ||||
| </t> | ||||
| <t> | ||||
| The client <bcp14>SHOULD</bcp14> request a notification for changes or | ||||
| deletion of a device ID to device address mapping so | ||||
| that the server can allow the client gracefully use a | ||||
| new mapping, without having pending I/O fail abruptly, | ||||
| or force layouts using the device ID to be recalled | ||||
| or revoked. | ||||
| </t> | ||||
| <t> | ||||
| It is possible that GETDEVICEINFO (and | ||||
| GETDEVICELIST) will race with CB_NOTIFY_DEVICEID, | ||||
| i.e., CB_NOTIFY_DEVICEID arrives before the client | ||||
| gets and processes the response to GETDEVICEINFO or | ||||
| GETDEVICELIST. The analysis of the race leverages the | ||||
| fact that the server <bcp14>MUST NOT</bcp14> delete a device ID that | ||||
| is referred to by a layout the client has. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| CB_NOTIFY_DEVICEID deletes a device ID. | ||||
| If the client believes it has layouts that refer to the | ||||
| device ID, then it is possible that layouts referring to | ||||
| the deleted device ID have been revoked. | ||||
| The client should send a TEST_STATEID request using the | ||||
| stateid for each layout that might have been revoked. If | ||||
| TEST_STATEID indicates that any layouts have been revoked, the | ||||
| client must recover from layout revocation as described in | ||||
| <xref target="revoke_layout" format="default"/>. If TEST_STATEID indicates that at least | ||||
| one layout has not been revoked, the client should send | ||||
| a GETDEVICEINFO operation on the supposedly deleted | ||||
| device ID to verify that the device ID | ||||
| has been deleted. | ||||
| </t> | ||||
| <t> | ||||
| If GETDEVICEINFO indicates that the device ID | ||||
| does not exist, then the client assumes the server is faulty | ||||
| and recovers by sending an EXCHANGE_ID operation. If GETDEVICEINFO | ||||
| indicates that the device ID does exist, then while the server is | ||||
| faulty for sending an erroneous device ID deletion notification, | ||||
| the degree to which it is faulty does not require the client to | ||||
| create a new client ID. | ||||
| </t> | ||||
| <t> | ||||
| If the client does not have layouts that refer to the | ||||
| device ID, no harm is done. | ||||
| The client should mark the device ID as deleted, and when | ||||
| GETDEVICEINFO or GETDEVICELIST results are | ||||
| received that indicate that the device ID has been | ||||
| in fact deleted, the device ID should be removed from the | ||||
| client's cache. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| CB_NOTIFY_DEVICEID indicates that a device ID's device | ||||
| addressing mappings have changed. The client should assume | ||||
| that the results from the in-progress GETDEVICEINFO | ||||
| will be stale for the device ID | ||||
| once received, and so it should send another GETDEVICEINFO | ||||
| on the device ID. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_GETDEVICELIST" numbered="true" toc="default"> | ||||
| <name>Operation 48: GETDEVICELIST - Get All Device Mappings for a File System</name> | ||||
| <section toc="exclude" anchor="OP_GETDEVICELIST_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct GETDEVICELIST4args { | ||||
| /* CURRENT_FH: object belonging to the file system */ | ||||
| layouttype4 gdla_layout_type; | ||||
| /* number of deviceIDs to return */ | ||||
| count4 gdla_maxdevices; | ||||
| nfs_cookie4 gdla_cookie; | ||||
| verifier4 gdla_cookieverf; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GETDEVICELIST_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct GETDEVICELIST4resok { | ||||
| nfs_cookie4 gdlr_cookie; | ||||
| verifier4 gdlr_cookieverf; | ||||
| deviceid4 gdlr_deviceid_list<>; | ||||
| bool gdlr_eof; | ||||
| }; | ||||
| union GETDEVICELIST4res switch (nfsstat4 gdlr_status) { | ||||
| case NFS4_OK: | ||||
| GETDEVICELIST4resok gdlr_resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GETDEVICELIST_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation is used by the client to enumerate all of the | ||||
| device IDs that a server's file system uses. | ||||
| </t> | ||||
| <t> | ||||
| The client provides a current filehandle of a file object that | ||||
| belongs to the file system (i.e., all file objects sharing the same | ||||
| fsid as that of the current filehandle) and the layout type | ||||
| in gdia_layout_type. Since | ||||
| this operation might require multiple calls to enumerate all the | ||||
| device IDs (and is thus | ||||
| similar to the <xref target="OP_READDIR" format="default"> | ||||
| READDIR</xref> operation), the client also provides gdia_cookie | ||||
| and gdia_cookieverf to specify the current cursor position in the | ||||
| list. When the client wants to read from the beginning of the | ||||
| file system's device mappings, it sets gdla_cookie to zero. The | ||||
| field gdla_cookieverf <bcp14>MUST</bcp14> be ignored by the server when | ||||
| gdla_cookie is zero. | ||||
| The client provides gdla_maxdevices to limit the number of device IDs | ||||
| in the result. If gdla_maxdevices is zero, the server <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_INVAL. | ||||
| The server <bcp14>MAY</bcp14> return fewer device IDs. | ||||
| </t> | ||||
| <t> | ||||
| The successful response to the operation will contain the | ||||
| cookie, gdlr_cookie, and the cookie verifier, gdlr_cookieverf, to be | ||||
| used on the subsequent GETDEVICELIST. A gdlr_eof value of TRUE | ||||
| signifies that there are no remaining entries in the server's | ||||
| device list. Each element of gdlr_deviceid_list contains | ||||
| a device ID. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_GETDEVICELIST_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| An example of the use of this operation is for pNFS | ||||
| clients and servers that use LAYOUT4_BLOCK_VOLUME | ||||
| layouts. In these environments it may be helpful | ||||
| for a client to determine device accessibility upon | ||||
| first file system access. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_LAYOUTCOMMIT" numbered="true" toc="default"> | ||||
| <name>Operation 49: LAYOUTCOMMIT - Commit Writes Made Using a Layout</name> | ||||
| <section toc="exclude" anchor="OP_LAYOUTCOMMIT_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| union newtime4 switch (bool nt_timechanged) { | ||||
| case TRUE: | ||||
| nfstime4 nt_time; | ||||
| case FALSE: | ||||
| void; | ||||
| }; | ||||
| union newoffset4 switch (bool no_newoffset) { | ||||
| case TRUE: | ||||
| offset4 no_offset; | ||||
| case FALSE: | ||||
| void; | ||||
| }; | ||||
| struct LAYOUTCOMMIT4args { | ||||
| /* CURRENT_FH: file */ | ||||
| offset4 loca_offset; | ||||
| length4 loca_length; | ||||
| bool loca_reclaim; | ||||
| stateid4 loca_stateid; | ||||
| newoffset4 loca_last_write_offset; | ||||
| newtime4 loca_time_modify; | ||||
| layoutupdate4 loca_layoutupdate; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LAYOUTCOMMIT_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| union newsize4 switch (bool ns_sizechanged) { | ||||
| case TRUE: | ||||
| length4 ns_size; | ||||
| case FALSE: | ||||
| void; | ||||
| }; | ||||
| struct LAYOUTCOMMIT4resok { | ||||
| newsize4 locr_newsize; | ||||
| }; | ||||
| union LAYOUTCOMMIT4res switch (nfsstat4 locr_status) { | ||||
| case NFS4_OK: | ||||
| LAYOUTCOMMIT4resok locr_resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LAYOUTCOMMIT_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The LAYOUTCOMMIT operation commits changes in the layout represented by the current | ||||
| filehandle, client ID (derived from the session ID in the | ||||
| preceding SEQUENCE operation), byte-range, and stateid. Since | ||||
| layouts are sub-dividable, a smaller portion of a layout, | ||||
| retrieved via LAYOUTGET, can be committed. The byte-range being | ||||
| committed is specified through the byte-range (loca_offset and | ||||
| loca_length). This byte-range <bcp14>MUST</bcp14> overlap with one or more existing layouts | ||||
| previously granted via LAYOUTGET (<xref target="OP_LAYOUTGET" format="default"/>), | ||||
| each with an iomode of LAYOUTIOMODE4_RW. In the | ||||
| case where the iomode of any held layout segment is not | ||||
| LAYOUTIOMODE4_RW, the server should return the error | ||||
| NFS4ERR_BAD_IOMODE. For the case where the client | ||||
| does not hold matching layout segment(s) for the | ||||
| defined byte-range, the server should return the error | ||||
| NFS4ERR_BAD_LAYOUT. | ||||
| </t> | ||||
| <t> | ||||
| The LAYOUTCOMMIT operation indicates that the client has | ||||
| completed writes using a layout obtained by a previous | ||||
| LAYOUTGET. The client may have only written a subset of the | ||||
| data range it previously requested. LAYOUTCOMMIT allows it to | ||||
| commit or discard provisionally allocated space and to update | ||||
| the server with a new end-of-file. The layout referenced by | ||||
| LAYOUTCOMMIT is still valid after the operation completes and | ||||
| can be continued to be referenced by the client ID, filehandle, | ||||
| byte-range, layout type, and stateid. | ||||
| </t> | ||||
| <t> | ||||
| If the loca_reclaim field is set to TRUE, this indicates that | ||||
| the client is attempting to commit changes to a layout after the | ||||
| restart of the metadata server during the metadata server's | ||||
| recovery grace period (see <xref target="mds_recovery" format="default"/>). This type of request may be necessary | ||||
| when the client has uncommitted writes to provisionally | ||||
| allocated byte-ranges of a file that were sent to the storage | ||||
| devices before the restart of the metadata server. In this case, | ||||
| the layout provided by the client <bcp14>MUST</bcp14> be a subset of a writable | ||||
| layout that the client held immediately before the restart of the | ||||
| metadata server. The value of the field loca_stateid <bcp14>MUST</bcp14> | ||||
| be a value that the metadata server returned before it restarted. | ||||
| The metadata server is free to accept or | ||||
| reject this request based on its own internal metadata | ||||
| consistency checks. If the metadata server finds that the | ||||
| layout provided by the client does not pass its consistency | ||||
| checks, it <bcp14>MUST</bcp14> reject the request with the status | ||||
| NFS4ERR_RECLAIM_BAD. The successful completion of the | ||||
| LAYOUTCOMMIT request with loca_reclaim set to TRUE does NOT | ||||
| provide the client with a layout for the file. It simply | ||||
| commits the changes to the layout specified in the | ||||
| loca_layoutupdate field. To obtain a layout for the file, the | ||||
| client must send a LAYOUTGET request to the server after the | ||||
| server's grace period has expired. If the metadata server | ||||
| receives a LAYOUTCOMMIT request with loca_reclaim set to TRUE | ||||
| when the metadata server is not in its recovery grace period, it | ||||
| <bcp14>MUST</bcp14> reject the request with the status NFS4ERR_NO_GRACE. | ||||
| </t> | ||||
| <t> | ||||
| Setting the loca_reclaim field to TRUE is required if and only | ||||
| if the committed layout was acquired before the metadata server | ||||
| restart. If the client is committing a layout that was acquired | ||||
| during the metadata server's grace period, it <bcp14>MUST</bcp14> set the | ||||
| "reclaim" field to FALSE. | ||||
| </t> | ||||
| <t> | ||||
| The loca_stateid is a layout stateid value as | ||||
| returned by previously successful layout operations | ||||
| (see <xref target="layout_stateid" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| The loca_last_write_offset field specifies the offset of the | ||||
| last byte written by the client previous to the LAYOUTCOMMIT. | ||||
| Note that this value is never equal to the file's size (at most | ||||
| it is one byte less than the file's size) and <bcp14>MUST</bcp14> be less than | ||||
| or equal to NFS4_MAXFILEOFF. Also, loca_last_write_offset <bcp14>MUST</bcp14> | ||||
| overlap the range described by loca_offset and loca_length. | ||||
| The metadata server | ||||
| may use this information to determine whether the file's size | ||||
| needs to be updated. If the metadata server updates the file's | ||||
| size as the result of the LAYOUTCOMMIT operation, it must return | ||||
| the new size (locr_newsize.ns_size) as part of the results. | ||||
| </t> | ||||
| <t> | ||||
| The loca_time_modify field | ||||
| allows the client to suggest a modification time it would like the metadata | ||||
| server to set. The metadata server may use the suggestion or | ||||
| it may use the time of the LAYOUTCOMMIT operation to set the modification | ||||
| time. If the metadata server uses the client-provided | ||||
| modification time, it should ensure that time does not flow backwards. If the | ||||
| client wants to force the metadata server to set an exact time, | ||||
| the client should use a SETATTR operation in a COMPOUND right | ||||
| after LAYOUTCOMMIT. See <xref target="committing_layout" format="default"/> for | ||||
| more details. If the client desires the resultant modification time, | ||||
| it should construct the COMPOUND so that a GETATTR | ||||
| follows the LAYOUTCOMMIT. | ||||
| </t> | ||||
| <t> | ||||
| The loca_layoutupdate argument to LAYOUTCOMMIT provides a mechanism | ||||
| for a client to provide layout-specific updates to the metadata | ||||
| server. For example, the layout update can describe what byte-ranges | ||||
| of the original layout have been used and what byte-ranges can be | ||||
| deallocated. There is no NFSv4.1 file layout-specific layoutupdate4 | ||||
| structure. | ||||
| </t> | ||||
| <t> | ||||
| The layout information is more verbose for block devices than for | ||||
| objects and files because the latter two hide the details of block | ||||
| allocation behind their storage protocols. At the minimum, the | ||||
| client needs to communicate changes to the end-of-file location back | ||||
| to the server, and, if desired, its view of the file's modification | ||||
| time. For block/volume layouts, it needs to specify precisely | ||||
| which blocks have been used. | ||||
| </t> | ||||
| <t> | ||||
| If the layout identified in the arguments does not exist, the | ||||
| error NFS4ERR_BADLAYOUT is returned. The layout being committed | ||||
| may also be rejected if it does not correspond to an existing | ||||
| layout with an iomode of LAYOUTIOMODE4_RW. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value and the | ||||
| current stateid retains its value. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LAYOUTCOMMIT_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The client <bcp14>MAY</bcp14> also use LAYOUTCOMMIT with the | ||||
| loca_reclaim field set to TRUE to convey hints to modified file | ||||
| attributes or to report layout-type specific information such as | ||||
| I/O errors for object-based storage layouts, as normally done | ||||
| during normal operation. Doing so may help the metadata server | ||||
| to recover files more efficiently after restart. For example, | ||||
| some file system implementations may require expansive recovery | ||||
| of file system objects if the metadata server does not get a | ||||
| positive indication from all clients holding a LAYOUTIOMODE4_RW layout that | ||||
| they have successfully completed all their writes. Sending a | ||||
| LAYOUTCOMMIT (if required) and then following with LAYOUTRETURN | ||||
| can provide such an indication and allow for graceful and | ||||
| efficient recovery. | ||||
| </t> | ||||
| <t> | ||||
| If loca_reclaim is TRUE, the metadata server is free to | ||||
| either examine or ignore the value in the field loca_stateid. | ||||
| The metadata server implementation might or might not | ||||
| encode in its layout | ||||
| stateid information that allows the metadate server to | ||||
| perform a consistency check on the LAYOUTCOMMIT request. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_LAYOUTGET" numbered="true" toc="default"> | ||||
| <name>Operation 50: LAYOUTGET - Get Layout Information</name> | ||||
| <section toc="exclude" anchor="OP_LAYOUTGET_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct LAYOUTGET4args { | ||||
| /* CURRENT_FH: file */ | ||||
| bool loga_signal_layout_avail; | ||||
| layouttype4 loga_layout_type; | ||||
| layoutiomode4 loga_iomode; | ||||
| offset4 loga_offset; | ||||
| length4 loga_length; | ||||
| length4 loga_minlength; | ||||
| stateid4 loga_stateid; | ||||
| count4 loga_maxcount; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LAYOUTGET_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct LAYOUTGET4resok { | ||||
| bool logr_return_on_close; | ||||
| stateid4 logr_stateid; | ||||
| layout4 logr_layout<>; | ||||
| }; | ||||
| union LAYOUTGET4res switch (nfsstat4 logr_status) { | ||||
| case NFS4_OK: | ||||
| LAYOUTGET4resok logr_resok4; | ||||
| case NFS4ERR_LAYOUTTRYLATER: | ||||
| bool logr_will_signal_layout_avail; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LAYOUTGET_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The LAYOUTGET operation requests a layout from the metadata server for reading or | ||||
| writing the file given by the filehandle at the | ||||
| byte-range specified by offset and length. Layouts are | ||||
| identified by the client ID (derived from the session ID in the | ||||
| preceding SEQUENCE operation), current filehandle, layout type | ||||
| (loga_layout_type), and the layout stateid (loga_stateid). The | ||||
| use of the loga_iomode field depends upon the layout type, but should | ||||
| reflect the client's data access intent. | ||||
| </t> | ||||
| <t> | ||||
| If the metadata server is in a grace period, and does not | ||||
| persist layouts and device ID to device address mappings, then | ||||
| it <bcp14>MUST</bcp14> return NFS4ERR_GRACE (see <xref target="reclaim_locks" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| The LAYOUTGET operation returns layout information | ||||
| for the specified byte-range: a layout. | ||||
| The client actually specifies two ranges, both starting | ||||
| at the offset in the loga_offset field. The first | ||||
| range is between loga_offset and loga_offset + loga_length - 1 | ||||
| inclusive. This range indicates the desired range the client | ||||
| wants the layout to cover. The second range is between | ||||
| loga_offset and loga_offset + loga_minlength - 1 inclusive. This | ||||
| range indicates the required range the client needs the layout | ||||
| to cover. Thus, loga_minlength <bcp14>MUST</bcp14> be less than or equal to | ||||
| loga_length. | ||||
| </t> | ||||
| <t> | ||||
| When a length field is set to NFS4_UINT64_MAX, | ||||
| this indicates a desire (when loga_length is NFS4_UINT64_MAX) | ||||
| or requirement (when loga_minlength is NFS4_UINT64_MAX) | ||||
| to get a layout from loga_offset through the | ||||
| end-of-file, regardless of the file's length. | ||||
| </t> | ||||
| <t> | ||||
| The following rules govern the relationships among, | ||||
| and the minima of, | ||||
| loga_length, loga_minlength, and loga_offset. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If loga_length is less than loga_minlength, the metadata server | ||||
| <bcp14>MUST</bcp14> return NFS4ERR_INVAL. | ||||
| </li> | ||||
| <li> | ||||
| If loga_minlength is zero, this is an indication | ||||
| to the metadata server that the client desires any layout | ||||
| at offset loga_offset or less that the metadata server has | ||||
| "readily available". Readily is subjective, and depends on | ||||
| the layout type and the pNFS server implementation. For example, | ||||
| some metadata servers might have to pre-allocate stable | ||||
| storage when they receive a request for a range of a | ||||
| file that goes beyond the file's current length. | ||||
| If loga_minlength is zero and | ||||
| loga_length is greater than zero, this tells the | ||||
| metadata server what range of the layout the client would | ||||
| prefer to have. If loga_length and loga_minlength | ||||
| are both zero, then the client is indicating that it desires | ||||
| a layout of any length with the ending offset of the range | ||||
| no less than the value specified loga_offset, and the starting offset at or | ||||
| below loga_offset. If the metadata server does not have | ||||
| a layout that is readily available, then it <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_LAYOUTTRYLATER. | ||||
| </li> | ||||
| <li> | ||||
| If the sum of loga_offset and loga_minlength exceeds | ||||
| NFS4_UINT64_MAX, and loga_minlength is not NFS4_UINT64_MAX, | ||||
| the error NFS4ERR_INVAL <bcp14>MUST</bcp14> result. | ||||
| </li> | ||||
| <li> | ||||
| If the sum of loga_offset and loga_length exceeds | ||||
| NFS4_UINT64_MAX, and loga_length is not NFS4_UINT64_MAX, | ||||
| the error NFS4ERR_INVAL <bcp14>MUST</bcp14> result. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| After the metadata server has performed the above checks on loga_offset, | ||||
| loga_minlength, and loga_offset, the metadata server <bcp14>MUST</bcp14> return a | ||||
| layout according to the rules in <xref target="layout_hell" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| Acceptable layouts based on loga_minlength. | ||||
| Note: u64m = NFS4_UINT64_MAX; a_off = loga_offset; | ||||
| a_minlen = loga_minlength. | ||||
| </t> | ||||
| <table anchor="layout_hell" align="center"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Layout iomode of request</th> | ||||
| <th align="left">Layout a_minlen of request</th> | ||||
| <th align="left">Layout iomode of reply</th> | ||||
| <th align="left">Layout offset of reply</th> | ||||
| <th align="left">Layout length of reply</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">_READ</td> | ||||
| <td align="left">u64m</td> | ||||
| <td align="left"><bcp14>MAY</bcp14> be _READ</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be >= file length - layout offset</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_READ</td> | ||||
| <td align="left">u64m</td> | ||||
| <td align="left"><bcp14>MAY</bcp14> be _RW</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be u64m</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_READ</td> | ||||
| <td align="left">> 0 and < u64m</td> | ||||
| <td align="left"><bcp14>MAY</bcp14> be _READ</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be >= MIN(file length, a_minlen + a_off) - layout offset</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_READ</td> | ||||
| <td align="left">> 0 and < u64m</td> | ||||
| <td align="left"><bcp14>MAY</bcp14> be _RW</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be >= a_off - layout offset + a_minlen</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_READ</td> | ||||
| <td align="left">0</td> | ||||
| <td align="left"><bcp14>MAY</bcp14> be _READ</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be > 0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_READ</td> | ||||
| <td align="left">0</td> | ||||
| <td align="left"><bcp14>MAY</bcp14> be _RW</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be > 0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_RW</td> | ||||
| <td align="left">u64m</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be _RW</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be u64m</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_RW</td> | ||||
| <td align="left">> 0 and < u64m</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be _RW</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be >= a_off - layout offset + a_minlen</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_RW</td> | ||||
| <td align="left">0</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be _RW</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be > 0</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> | ||||
| If loga_minlength is not zero and the metadata server cannot return a layout according | ||||
| to the rules in <xref target="layout_hell" format="default"/>, | ||||
| then the metadata server <bcp14>MUST</bcp14> return the error | ||||
| NFS4ERR_BADLAYOUT. If loga_minlength is zero and the metadata server | ||||
| cannot or will not return a layout according | ||||
| to the rules in <xref target="layout_hell" format="default"/>, | ||||
| then the metadata server <bcp14>MUST</bcp14> return the error | ||||
| NFS4ERR_LAYOUTTRYLATER. | ||||
| Assuming that loga_length is greater | ||||
| than loga_minlength or equal to zero, the metadata server <bcp14>SHOULD</bcp14> | ||||
| return a layout according to the rules in <xref target="layout_hell2" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| Desired layouts based on loga_length. | ||||
| The rules of <xref target="layout_hell" format="default"/> <bcp14>MUST</bcp14> be applied first. | ||||
| Note: u64m = NFS4_UINT64_MAX; a_off = loga_offset; | ||||
| a_len = loga_length. | ||||
| </t> | ||||
| <table anchor="layout_hell2" align="center"> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Layout iomode of request</th> | ||||
| <th align="left">Layout a_len of request</th> | ||||
| <th align="left">Layout iomode of reply</th> | ||||
| <th align="left">Layout offset of reply</th> | ||||
| <th align="left">Layout length of reply</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">_READ</td> | ||||
| <td align="left">u64m</td> | ||||
| <td align="left"><bcp14>MAY</bcp14> be _READ</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>SHOULD</bcp14> be u64m</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_READ</td> | ||||
| <td align="left">u64m</td> | ||||
| <td align="left"><bcp14>MAY</bcp14> be _RW</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>SHOULD</bcp14> be u64m</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_READ</td> | ||||
| <td align="left">> 0 and < u64m</td> | ||||
| <td align="left"><bcp14>MAY</bcp14> be _READ</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>SHOULD</bcp14> be >= a_off - layout offset + a_len</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_READ</td> | ||||
| <td align="left">> 0 and < u64m</td> | ||||
| <td align="left"><bcp14>MAY</bcp14> be _RW</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>SHOULD</bcp14> be >= a_off - layout offset + a_len</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_READ</td> | ||||
| <td align="left">0</td> | ||||
| <td align="left"><bcp14>MAY</bcp14> be _READ</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>SHOULD</bcp14> be > a_off - layout offset</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_READ</td> | ||||
| <td align="left">0</td> | ||||
| <td align="left"><bcp14>MAY</bcp14> be _READ</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>SHOULD</bcp14> be > a_off - layout offset</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_RW</td> | ||||
| <td align="left">u64m</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be _RW</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>SHOULD</bcp14> be u64m</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_RW</td> | ||||
| <td align="left">> 0 and < u64m</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be _RW</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>SHOULD</bcp14> be >= a_off - layout offset + a_len</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">_RW</td> | ||||
| <td align="left">0</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be _RW</td> | ||||
| <td align="left"><bcp14>MUST</bcp14> be <= a_off</td> | ||||
| <td align="left"><bcp14>SHOULD</bcp14> be > a_off - layout offset</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> | ||||
| The loga_stateid field specifies a valid stateid. | ||||
| If a layout is not currently held by the client, | ||||
| the loga_stateid field represents a stateid | ||||
| reflecting the correspondingly valid open, | ||||
| byte-range lock, or delegation stateid. Once a | ||||
| layout is held on the file by the client, the | ||||
| loga_stateid field <bcp14>MUST</bcp14> be a stateid as returned from | ||||
| a previous LAYOUTGET or LAYOUTRETURN operation or | ||||
| provided by a CB_LAYOUTRECALL operation (see <xref target="layout_stateid" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| The loga_maxcount field specifies the maximum layout size (in bytes) | ||||
| that the client can handle. If the size of the layout structure | ||||
| exceeds the size specified by maxcount, the metadata server will | ||||
| return the NFS4ERR_TOOSMALL error. | ||||
| </t> | ||||
| <t> | ||||
| The returned layout is expressed as an array, | ||||
| logr_layout, with each element of type layout4. If a | ||||
| file has a single striping pattern, then logr_layout | ||||
| <bcp14>SHOULD</bcp14> contain just one entry. Otherwise, if the | ||||
| requested range overlaps more than one striping | ||||
| pattern, logr_layout will contain the required number | ||||
| of entries. The elements of logr_layout <bcp14>MUST</bcp14> be sorted | ||||
| in ascending order of the value of the lo_offset field | ||||
| of each element. There <bcp14>MUST</bcp14> be no gaps or overlaps | ||||
| in the range between two successive elements of | ||||
| logr_layout. The lo_iomode field in each element of | ||||
| logr_layout <bcp14>MUST</bcp14> be the same. | ||||
| </t> | ||||
| <t> | ||||
| <xref target="layout_hell" format="default"/> | ||||
| and | ||||
| <xref target="layout_hell2" format="default"/> | ||||
| both refer to a returned layout iomode, offset, and length. | ||||
| Because the returned layout is encoded in the logr_layout array, | ||||
| more description is required. | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>iomode</dt> | ||||
| <dd> | ||||
| The value of the returned layout iomode listed in | ||||
| <xref target="layout_hell" format="default"/> | ||||
| and | ||||
| <xref target="layout_hell2" format="default"/> | ||||
| is equal to the value of the lo_iomode field in each | ||||
| element of logr_layout. | ||||
| As shown in <xref target="layout_hell" format="default"/> | ||||
| and <xref target="layout_hell2" format="default"/>, | ||||
| the metadata server <bcp14>MAY</bcp14> return a layout with an lo_iomode | ||||
| different from the requested iomode (field loga_iomode of the request). | ||||
| If it does so, it <bcp14>MUST</bcp14> | ||||
| ensure that the lo_iomode is more permissive than the | ||||
| loga_iomode requested. For example, this behavior allows an | ||||
| implementation to upgrade LAYOUTIOMODE4_READ requests to LAYOUTIOMODE4_RW | ||||
| requests at its discretion, within the limits of the layout type | ||||
| specific protocol. A lo_iomode of either LAYOUTIOMODE4_READ or | ||||
| LAYOUTIOMODE4_RW <bcp14>MUST</bcp14> be returned. | ||||
| </dd> | ||||
| <dt>offset</dt> | ||||
| <dd> | ||||
| The value of the returned layout offset listed in | ||||
| <xref target="layout_hell" format="default"/> | ||||
| and | ||||
| <xref target="layout_hell2" format="default"/> | ||||
| is always equal to the lo_offset field of the first | ||||
| element logr_layout. | ||||
| </dd> | ||||
| <dt>length</dt> | ||||
| <dd> | ||||
| <t> | ||||
| When setting the value of the returned layout | ||||
| length, the situation is complicated by the | ||||
| possibility that the special layout length value | ||||
| NFS4_UINT64_MAX is involved. For a logr_layout | ||||
| array of N elements, the lo_length field in the | ||||
| first N-1 elements <bcp14>MUST NOT</bcp14> be NFS4_UINT64_MAX. The | ||||
| lo_length field of the last element of logr_layout | ||||
| can be NFS4_UINT64_MAX under some conditions as | ||||
| described in the following list. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If an applicable rule of <xref target="layout_hell" format="default"/> | ||||
| states that the metadata server <bcp14>MUST</bcp14> return a layout of length | ||||
| NFS4_UINT64_MAX, then the lo_length field of the last | ||||
| element of logr_layout <bcp14>MUST</bcp14> be NFS4_UINT64_MAX. | ||||
| </li> | ||||
| <li> | ||||
| If an applicable rule of <xref target="layout_hell" format="default"/> | ||||
| states that the metadata server <bcp14>MUST NOT</bcp14> return a layout of length | ||||
| NFS4_UINT64_MAX, then the lo_length field of the last | ||||
| element of logr_layout <bcp14>MUST NOT</bcp14> be NFS4_UINT64_MAX. | ||||
| </li> | ||||
| <li> | ||||
| If an applicable rule of <xref target="layout_hell2" format="default"/> | ||||
| states that the metadata server <bcp14>SHOULD</bcp14> return a layout of length | ||||
| NFS4_UINT64_MAX, then the lo_length field of the last | ||||
| element of logr_layout <bcp14>SHOULD</bcp14> be NFS4_UINT64_MAX. | ||||
| </li> | ||||
| <li> | ||||
| When the value of the returned layout length of | ||||
| <xref target="layout_hell" format="default"/> | ||||
| and | ||||
| <xref target="layout_hell2" format="default"/> is not NFS4_UINT64_MAX, then | ||||
| the returned layout length is equal to the sum of the | ||||
| lo_length fields of each element of logr_layout. | ||||
| </li> | ||||
| </ul> | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| The logr_return_on_close result field is a directive to return | ||||
| the layout before closing the file. When the metadata server sets this | ||||
| return value to TRUE, it <bcp14>MUST</bcp14> be prepared to recall the layout | ||||
| in the case in which the client fails to return the layout before close. | ||||
| For the metadata server that knows a layout must be returned before a | ||||
| close of the file, this return value can be used to communicate | ||||
| the desired behavior to the client and thus remove one extra | ||||
| step from the client's and metadata server's interaction. | ||||
| </t> | ||||
| <t> | ||||
| The logr_stateid stateid is returned to | ||||
| the client for use in subsequent layout related operations. See Sections | ||||
| <xref target="stateid" format="counter"/>, <xref target="layout_stateid" format="counter"/>, and | ||||
| <xref target="pnfs_operation_sequencing" format="counter"/> for a further | ||||
| discussion and requirements. | ||||
| </t> | ||||
| <t> | ||||
| The format of the returned layout (lo_content) | ||||
| is specific to the layout type. | ||||
| The value of the layout type (lo_content.loc_type) for each of | ||||
| the elements of the array of layouts returned by the metadata server | ||||
| (logr_layout) <bcp14>MUST</bcp14> be equal to the loga_layout_type specified | ||||
| by the client. If it is not equal, the client <bcp14>SHOULD</bcp14> ignore | ||||
| the response as invalid and behave as if the metadata server returned | ||||
| an error, even if the client does have support for the | ||||
| layout type returned. | ||||
| </t> | ||||
| <t> | ||||
| If neither the requested file nor its | ||||
| containing file system support layouts, the metadata server <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_LAYOUTUNAVAILABLE. If the layout type is not supported, | ||||
| the metadata server <bcp14>MUST</bcp14> return NFS4ERR_UNKNOWN_LAYOUTTYPE. | ||||
| If layouts are supported but no layout matches the client | ||||
| provided layout identification, the metadata server <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_BADLAYOUT. If an invalid loga_iomode is specified, or a | ||||
| loga_iomode of LAYOUTIOMODE4_ANY is specified, the metadata server <bcp14>MUST</bcp14> | ||||
| return NFS4ERR_BADIOMODE. | ||||
| </t> | ||||
| <t> | ||||
| If the layout for the file is unavailable due to transient | ||||
| conditions, e.g., file sharing prohibits layouts, the metadata server <bcp14>MUST</bcp14> | ||||
| return NFS4ERR_LAYOUTTRYLATER. | ||||
| </t> | ||||
| <t> | ||||
| If the layout request is rejected due to an overlapping layout | ||||
| recall, the metadata server <bcp14>MUST</bcp14> return NFS4ERR_RECALLCONFLICT. See <xref target="pnfs_operation_sequencing" format="default"/> for details. | ||||
| </t> | ||||
| <t> | ||||
| If the layout conflicts with a mandatory byte-range lock held on the | ||||
| file, and if the storage devices have no method of enforcing | ||||
| mandatory locks, other than through the restriction of layouts, the | ||||
| metadata server <bcp14>SHOULD</bcp14> return NFS4ERR_LOCKED. | ||||
| </t> | ||||
| <t> | ||||
| If client sets loga_signal_layout_avail to TRUE, then it is | ||||
| registering with the client a "want" for a layout in the event | ||||
| the layout cannot be obtained due to resource exhaustion. | ||||
| If the metadata server supports and will honor the "want", | ||||
| the results will have logr_will_signal_layout_avail | ||||
| set to TRUE. | ||||
| If so, the client should expect a CB_RECALLABLE_OBJ_AVAIL | ||||
| operation to indicate that a layout is available. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value and the | ||||
| current stateid is updated to match the value as returned in the | ||||
| results. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LAYOUTGET_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| Typically, LAYOUTGET will be called as part of a | ||||
| COMPOUND request after an OPEN operation and results | ||||
| in the client having location information for the | ||||
| file. This requires that loga_stateid be set to the | ||||
| special stateid that tells the metadata server to use the | ||||
| current stateid, which is set by OPEN (see <xref target="current_stateid" format="default"/>). A client may also hold | ||||
| a layout across multiple OPENs. The client specifies | ||||
| a layout type that limits what kind of layout the | ||||
| metadata server will return. This prevents metadata servers from | ||||
| granting layouts that are unusable by the client. | ||||
| </t> | ||||
| <t> | ||||
| As indicated by <xref target="layout_hell" format="default"/> and | ||||
| <xref target="layout_hell2" format="default"/>, the specification of | ||||
| LAYOUTGET allows a pNFS client and server considerable | ||||
| flexibility. | ||||
| A pNFS client can take several strategies for sending | ||||
| LAYOUTGET. Some examples are as follows. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If LAYOUTGET is preceded by OPEN in the same | ||||
| COMPOUND request and the OPEN requests OPEN4_SHARE_ACCESS_READ access, | ||||
| the client might opt to request a _READ layout | ||||
| with loga_offset set to zero, loga_minlength set to | ||||
| zero, and loga_length set to NFS4_UINT64_MAX. If | ||||
| the file has space allocated to it, that space is | ||||
| striped over one or more storage devices, and there | ||||
| is either no conflicting layout or the concept of | ||||
| a conflicting layout does not apply to the pNFS | ||||
| server's layout type or implementation, then the | ||||
| metadata server might return a layout with a starting offset | ||||
| of zero, and a length equal to the length of the | ||||
| file, if not NFS4_UINT64_MAX. If the length of the | ||||
| file is not a multiple of the | ||||
| pNFS server's stripe | ||||
| width (see <xref target="file_layout_definitions" format="default"/> | ||||
| for a formal definition), the metadata server might round up | ||||
| the returned layout's length. | ||||
| </li> | ||||
| <li> | ||||
| If LAYOUTGET is preceded by OPEN in the same | ||||
| COMPOUND request, and the OPEN requests OPEN4_SHARE_ACCESS_WRITE access and does | ||||
| not truncate the file, the client might | ||||
| opt to request a _RW layout with loga_offset set | ||||
| to zero, loga_minlength set to zero, and loga_length | ||||
| set to the file's current length (if known), or | ||||
| NFS4_UINT64_MAX. As with the previous case, under | ||||
| some conditions the metadata server might return a layout | ||||
| that covers the entire length of the file or beyond. | ||||
| </li> | ||||
| <li> | ||||
| This strategy is as above, but the OPEN truncates the file. In this case, | ||||
| the client might anticipate it will be writing to the | ||||
| file from offset zero, and so loga_offset and loga_minlength | ||||
| are set to zero, and loga_length is set to the value of | ||||
| threshold4_write_iosize. The metadata server might return a layout | ||||
| from offset zero with a length at least as long as | ||||
| threshold4_write_iosize. | ||||
| </li> | ||||
| <li> | ||||
| A process on the client invokes a request to read | ||||
| from offset 10000 for length 50000. The client | ||||
| is using buffered I/O, and has buffer sizes of | ||||
| 4096 bytes. The client intends to map the request | ||||
| of the process into a series of READ requests | ||||
| starting at offset 8192. The end offset needs to be higher | ||||
| than 10000 + 50000 = 60000, and the next offset that is | ||||
| a multiple of 4096 is 61440. The difference between 61440 and | ||||
| that starting offset of the layout is 53248 (which is | ||||
| the product of 4096 and 15). | ||||
| The value | ||||
| of threshold4_read_iosize is less than 53248, | ||||
| so the client sends a LAYOUTGET request with | ||||
| loga_offset set to 8192, loga_minlength set to | ||||
| 53248, and loga_length set to the file's length | ||||
| (if known) minus 8192 or NFS4_UINT64_MAX (if the | ||||
| file's length is not known). Since this LAYOUTGET | ||||
| request exceeds the metadata server's threshold, it grants | ||||
| the layout, possibly with an initial offset of | ||||
| zero, with an end offset of at least 8192 + 53248 - | ||||
| 1 = 61439, but preferably a layout with an offset | ||||
| aligned on the stripe width and a length that is | ||||
| a multiple of the stripe width. | ||||
| </li> | ||||
| <li> | ||||
| This strategy is as above, but the client is not using buffered I/O, and | ||||
| instead all internal I/O requests are sent directly to | ||||
| the server. The LAYOUTGET request has loga_offset equal to | ||||
| 10000 and loga_minlength set to 50000. The value of loga_length | ||||
| is set to the length of the file. The metadata server is free to | ||||
| return a layout that fully overlaps the requested range, with | ||||
| a starting offset and length aligned on the stripe width. | ||||
| </li> | ||||
| <li> | ||||
| Again, a process on the client invokes a request | ||||
| to read from offset 10000 for length 50000 (i.e. a | ||||
| range with a starting offset of 10000 and an ending | ||||
| offset of 69999), and | ||||
| buffered I/O is in use. The client is expecting | ||||
| that the server might not be able to return the | ||||
| layout for the full I/O range. | ||||
| The client intends to map the request of the | ||||
| process into a series of thirteen READ requests starting at | ||||
| offset 8192, each with length 4096, with a total | ||||
| length of 53248 (which equals 13 * 4096), which | ||||
| fully contains the range that client's process wants to read. | ||||
| Because the value of threshold4_read_iosize is equal to | ||||
| 4096, it is practical and reasonable for the client to | ||||
| use several LAYOUTGET operations to complete the series | ||||
| of READs. | ||||
| The client sends a LAYOUTGET request with | ||||
| loga_offset set to 8192, loga_minlength set to 4096, | ||||
| and loga_length set to 53248 or higher. The server | ||||
| will grant a layout possibly with an initial offset | ||||
| of zero, with an end offset of at least 8192 + 4096 - | ||||
| 1 = 12287, but preferably a layout with an offset | ||||
| aligned on the stripe width and a length that is a | ||||
| multiple of the stripe width. This will allow the | ||||
| client to make forward progress, possibly | ||||
| sending more LAYOUTGET operations for the remainder | ||||
| of the range. | ||||
| </li> | ||||
| <li> | ||||
| An NFS client detects a sequential read pattern, | ||||
| and so sends a LAYOUTGET operation that goes well beyond any | ||||
| current or pending read requests to the server. The | ||||
| server might likewise detect this pattern, and | ||||
| grant the LAYOUTGET request. Once the client | ||||
| reads from an offset of the file that represents | ||||
| 50% of the way through the range of the last layout | ||||
| it received, in order to avoid stalling I/O that would wait | ||||
| for a layout, the client sends more operations | ||||
| from an offset of the file that represents 50% | ||||
| of the way through the last layout it received. The client | ||||
| continues to request layouts with byte-ranges that are | ||||
| well in advance of the byte-ranges of | ||||
| recent and/or read requests of processes running on the client. | ||||
| </li> | ||||
| <li> | ||||
| This strategy is as above, but the client fails to detect the | ||||
| pattern, but the server does. The next time the | ||||
| metadata server gets a LAYOUTGET, it returns a layout with | ||||
| a length that is well beyond loga_minlength. | ||||
| </li> | ||||
| <li> | ||||
| A client is using buffered I/O, and has a long | ||||
| queue of write-behinds to process and also detects | ||||
| a sequential write pattern. It sends a LAYOUTGET | ||||
| for a layout that spans the range of the queued | ||||
| write-behinds and well beyond, including ranges | ||||
| beyond the filer's current length. The client | ||||
| continues to send LAYOUTGET operations once the write-behind | ||||
| queue reaches 50% of the maximum queue length. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Once the client has obtained a layout referring to a | ||||
| particular device ID, the metadata server <bcp14>MUST NOT</bcp14> | ||||
| delete the device ID until the layout is returned | ||||
| or revoked. | ||||
| </t> | ||||
| <t> | ||||
| CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race | ||||
| scenario is that LAYOUTGET returns a device ID for which the | ||||
| client does not have device address mappings, | ||||
| and the metadata server sends a CB_NOTIFY_DEVICEID | ||||
| to add the device ID to the client's awareness | ||||
| and meanwhile the client sends GETDEVICEINFO on | ||||
| the device ID. This scenario is discussed in | ||||
| <xref target="OP_GETDEVICEINFO_IMPLEMENTATION" format="default"/>. | ||||
| Another scenario is that the CB_NOTIFY_DEVICEID | ||||
| is processed by the client before it processes | ||||
| the results from LAYOUTGET. The client will send | ||||
| a GETDEVICEINFO on the device ID. If the results | ||||
| from GETDEVICEINFO are received before the client | ||||
| gets results from LAYOUTGET, then there is no | ||||
| longer a race. If the results from LAYOUTGET are | ||||
| received before the results from GETDEVICEINFO, the | ||||
| client can either wait for results of GETDEVICEINFO | ||||
| or send another one to get possibly more up-to-date | ||||
| device address mappings for the device ID. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_LAYOUTRETURN" numbered="true" toc="default"> | ||||
| <name>Operation 51: LAYOUTRETURN - Release Layout Information</name> | ||||
| <section toc="exclude" anchor="OP_LAYOUTRETURN_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */ | ||||
| const LAYOUT4_RET_REC_FILE = 1; | ||||
| const LAYOUT4_RET_REC_FSID = 2; | ||||
| const LAYOUT4_RET_REC_ALL = 3; | ||||
| enum layoutreturn_type4 { | ||||
| LAYOUTRETURN4_FILE = LAYOUT4_RET_REC_FILE, | ||||
| LAYOUTRETURN4_FSID = LAYOUT4_RET_REC_FSID, | ||||
| LAYOUTRETURN4_ALL = LAYOUT4_RET_REC_ALL | ||||
| }; | ||||
| struct layoutreturn_file4 { | ||||
| offset4 lrf_offset; | ||||
| length4 lrf_length; | ||||
| stateid4 lrf_stateid; | ||||
| /* layouttype4 specific data */ | ||||
| opaque lrf_body<>; | ||||
| }; | ||||
| union layoutreturn4 switch(layoutreturn_type4 lr_returntype) { | ||||
| case LAYOUTRETURN4_FILE: | ||||
| layoutreturn_file4 lr_layout; | ||||
| default: | ||||
| void; | ||||
| }; | ||||
| struct LAYOUTRETURN4args { | ||||
| /* CURRENT_FH: file */ | ||||
| bool lora_reclaim; | ||||
| layouttype4 lora_layout_type; | ||||
| layoutiomode4 lora_iomode; | ||||
| layoutreturn4 lora_layoutreturn; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LAYOUTRETURN_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| union layoutreturn_stateid switch (bool lrs_present) { | ||||
| case TRUE: | ||||
| stateid4 lrs_stateid; | ||||
| case FALSE: | ||||
| void; | ||||
| }; | ||||
| union LAYOUTRETURN4res switch (nfsstat4 lorr_status) { | ||||
| case NFS4_OK: | ||||
| layoutreturn_stateid lorr_stateid; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LAYOUTRETURN_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation returns from the client to the server | ||||
| one or more layouts represented by the client ID | ||||
| (derived from the session ID in the preceding SEQUENCE | ||||
| operation), lora_layout_type, and lora_iomode. | ||||
| When lr_returntype is LAYOUTRETURN4_FILE, the | ||||
| returned layout is further identified by the current | ||||
| filehandle, lrf_offset, lrf_length, and lrf_stateid. | ||||
| If the lrf_length field is NFS4_UINT64_MAX, all bytes | ||||
| of the layout, starting at lrf_offset, are returned. | ||||
| When lr_returntype is LAYOUTRETURN4_FSID, the | ||||
| current filehandle is used to identify the file | ||||
| system and all layouts matching the client ID, | ||||
| the fsid of the file system, lora_layout_type, and | ||||
| lora_iomode are returned. When lr_returntype is | ||||
| LAYOUTRETURN4_ALL, all layouts matching the client | ||||
| ID, lora_layout_type, and lora_iomode are returned | ||||
| and the current filehandle is not used. After this | ||||
| call, the client <bcp14>MUST NOT</bcp14> use the returned layout(s) | ||||
| and the associated storage protocol to access the | ||||
| file data. | ||||
| </t> | ||||
| <t> | ||||
| If the set of layouts designated in the case of | ||||
| LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL is empty, then no error | ||||
| results. In the case of LAYOUTRETURN4_FILE, the byte-range | ||||
| specified is returned even if it is a subdivision of a layout | ||||
| previously obtained with LAYOUTGET, a combination of multiple | ||||
| layouts previously obtained with LAYOUTGET, or a combination | ||||
| including some layouts previously obtained with LAYOUTGET, | ||||
| and one or more subdivisions of such layouts. When the | ||||
| byte-range does not designate any bytes for which a layout | ||||
| is held for the specified file, client ID, layout type and | ||||
| mode, no error results. | ||||
| See <xref target="bulk_layouts" format="default"/> for considerations with | ||||
| "bulk" return of layouts. | ||||
| </t> | ||||
| <t> | ||||
| The layout being returned may be a subset | ||||
| or superset of a layout specified by CB_LAYOUTRECALL. However, | ||||
| if it is a subset, the recall is not complete until the full | ||||
| recalled scope has been returned. Recalled scope refers to the | ||||
| byte-range in the case of LAYOUTRETURN4_FILE, the use of | ||||
| LAYOUTRETURN4_FSID, or the use of LAYOUTRETURN4_ALL. There must | ||||
| be a LAYOUTRETURN with a matching scope to complete the return | ||||
| even if all current layout ranges have been previously individually | ||||
| returned. | ||||
| </t> | ||||
| <t> | ||||
| For all lr_returntype values, an iomode of LAYOUTIOMODE4_ANY | ||||
| specifies that all layouts that match the other arguments to | ||||
| LAYOUTRETURN (i.e., client ID, lora_layout_type, and one of | ||||
| current filehandle and range; fsid derived from current | ||||
| filehandle; or LAYOUTRETURN4_ALL) are being returned. | ||||
| </t> | ||||
| <t> | ||||
| In the case that lr_returntype is LAYOUTRETURN4_FILE, the | ||||
| lrf_stateid provided by the client is a layout stateid as | ||||
| returned from previous layout operations. Note that the "seqid" | ||||
| field of lrf_stateid <bcp14>MUST NOT</bcp14> be zero. See Sections | ||||
| <xref target="stateid" format="counter"/>, <xref target="layout_stateid" format="counter"/>, and | ||||
| <xref target="pnfs_operation_sequencing" format="counter"/> for a further | ||||
| discussion and requirements. | ||||
| </t> | ||||
| <t> | ||||
| Return of a layout or all layouts does not invalidate the | ||||
| mapping of storage device ID to a storage device address. The | ||||
| mapping remains in effect until specifically changed or deleted via | ||||
| device ID notification callbacks. | ||||
| Of course if there are no remaining | ||||
| layouts that refer to a previously used device ID, the server is | ||||
| free to delete a device ID without a notification callback, which | ||||
| will be the case when notifications are not in effect. | ||||
| </t> | ||||
| <t> | ||||
| If the lora_reclaim field is set to TRUE, the | ||||
| client is attempting to return a layout that | ||||
| was acquired before the restart of the metadata | ||||
| server during the metadata server's grace period. | ||||
| When returning layouts that were acquired during | ||||
| the metadata server's grace period, the client <bcp14>MUST</bcp14> set the | ||||
| lora_reclaim field to FALSE. The lora_reclaim field | ||||
| <bcp14>MUST</bcp14> be set to FALSE also when lr_layoutreturn is | ||||
| LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL. See <xref target="OP_LAYOUTCOMMIT" format="default">LAYOUTCOMMIT </xref> for | ||||
| more details. | ||||
| </t> | ||||
| <t> | ||||
| Layouts may be returned when recalled or voluntarily (i.e., | ||||
| before the server has recalled them). In either case, the client | ||||
| must properly propagate state changed under the context of the | ||||
| layout to the storage device(s) or to the metadata server before | ||||
| returning the layout. | ||||
| </t> | ||||
| <t> | ||||
| If the client returns the layout in response to a | ||||
| CB_LAYOUTRECALL where the lor_recalltype field of the | ||||
| clora_recall field was LAYOUTRECALL4_FILE, the client | ||||
| should use the lor_stateid value from CB_LAYOUTRECALL | ||||
| as the value for lrf_stateid. Otherwise, it should | ||||
| use logr_stateid (from a previous LAYOUTGET result) | ||||
| or lorr_stateid (from a previous LAYRETURN result). | ||||
| This is done to indicate the point in time (in terms | ||||
| of layout stateid transitions) when the recall was | ||||
| sent. The client uses the precise lora_recallstateid | ||||
| value and <bcp14>MUST NOT</bcp14> set the stateid's seqid to | ||||
| zero; otherwise, NFS4ERR_BAD_STATEID <bcp14>MUST</bcp14> be | ||||
| returned. NFS4ERR_OLD_STATEID can be returned if | ||||
| the client is using an old seqid, and the server | ||||
| knows the client should not be using the old | ||||
| seqid. For example, the client uses the seqid on slot 1 of | ||||
| the session, receives the response with the new | ||||
| seqid, and uses the slot to send another request | ||||
| with the old seqid. | ||||
| </t> | ||||
| <t> | ||||
| If a client fails to return a layout | ||||
| in a timely manner, then the metadata server <bcp14>SHOULD</bcp14> use its | ||||
| control protocol with the storage devices to fence the client | ||||
| from accessing the data referenced by the layout. See | ||||
| <xref target="recalling_layout" format="default"/> for more details. | ||||
| </t> | ||||
| <t> | ||||
| If the LAYOUTRETURN request sets the lora_reclaim field to TRUE after | ||||
| the metadata server's grace period, NFS4ERR_NO_GRACE is returned. | ||||
| </t> | ||||
| <t> | ||||
| If the LAYOUTRETURN request sets the lora_reclaim field to TRUE | ||||
| and lr_returntype is set to LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL, | ||||
| NFS4ERR_INVAL is returned. | ||||
| </t> | ||||
| <t> | ||||
| If the client sets the lr_returntype field to | ||||
| LAYOUTRETURN4_FILE, then the lrs_stateid field | ||||
| will represent the layout stateid as updated for | ||||
| this operation's processing; the current stateid | ||||
| will also be updated to match the returned value. | ||||
| If the last byte of any layout for the current | ||||
| file, client ID, and layout type is being returned | ||||
| and there are no remaining pending CB_LAYOUTRECALL | ||||
| operations for which a LAYOUTRETURN operation must be | ||||
| done, lrs_present <bcp14>MUST</bcp14> be FALSE, and no stateid | ||||
| will be returned. In addition, the COMPOUND request's current | ||||
| stateid will be set to the all-zeroes special stateid | ||||
| (see <xref target="current_stateid" format="default"/>). The server | ||||
| <bcp14>MUST</bcp14> reject with NFS4ERR_BAD_STATEID any further | ||||
| use of the current stateid in that COMPOUND until | ||||
| the current stateid is re-established by a later | ||||
| stateid-returning operation. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle retains its value. | ||||
| </t> | ||||
| <t> | ||||
| If the EXCHGID4_FLAG_BIND_PRINC_STATEID | ||||
| capability is set on the client ID (see <xref target="OP_EXCHANGE_ID" format="default"/>), the server will | ||||
| require that the principal, security flavor, | ||||
| and if applicable, the GSS mechanism, combination | ||||
| that acquired the layout also be the one to send | ||||
| LAYOUTRETURN. This might not be possible | ||||
| if credentials for the principal are no | ||||
| longer available. The server will allow the | ||||
| machine credential or SSV credential (see <xref target="OP_EXCHANGE_ID" format="default"/>) to send LAYOUTRETURN | ||||
| if LAYOUTRETURN's operation code was set in the | ||||
| spo_must_allow result of EXCHANGE_ID. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_LAYOUTRETURN_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The final LAYOUTRETURN operation in response to a CB_LAYOUTRECALL | ||||
| callback <bcp14>MUST</bcp14> be serialized with any outstanding, intersecting | ||||
| LAYOUTRETURN operations. Note that it is possible that while a | ||||
| client is returning the layout for some recalled range, the server | ||||
| may recall a superset of that range (e.g., LAYOUTRECALL4_ALL); the final | ||||
| return operation for the latter must block until the former layout | ||||
| recall is done. | ||||
| </t> | ||||
| <t> | ||||
| Returning all layouts in a file system using LAYOUTRETURN4_FSID is | ||||
| typically done in response to a CB_LAYOUTRECALL for that file system | ||||
| as the final return operation. Similarly, LAYOUTRETURN4_ALL | ||||
| is used in response to a recall callback for all layouts. It is | ||||
| possible that the client already returned some outstanding layouts | ||||
| via individual LAYOUTRETURN calls and the call for | ||||
| LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL marks the end of the | ||||
| LAYOUTRETURN sequence. See <xref target="recall_robustness" format="default"/> | ||||
| for more details. | ||||
| </t> | ||||
| <t> | ||||
| Once the client has returned all layouts referring to a particular | ||||
| device ID, the server <bcp14>MAY</bcp14> delete the device ID. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_SECINFO_NO_NAME" numbered="true" toc="default"> | ||||
| <name>Operation 52: SECINFO_NO_NAME - Get Security on Unnamed Object</name> | ||||
| <section toc="exclude" anchor="OP_SECINFO_NO_NAME_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| enum secinfo_style4 { | ||||
| SECINFO_STYLE4_CURRENT_FH = 0, | ||||
| SECINFO_STYLE4_PARENT = 1 | ||||
| }; | ||||
| /* CURRENT_FH: object or child directory */ | ||||
| typedef secinfo_style4 SECINFO_NO_NAME4args; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SECINFO_NO_NAME_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* CURRENTFH: consumed if status is NFS4_OK */ | ||||
| typedef SECINFO4res SECINFO_NO_NAME4res; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SECINFO_NO_NAME_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| Like the SECINFO operation, SECINFO_NO_NAME is used by the | ||||
| client to obtain a list of valid RPC authentication flavors for | ||||
| a specific file object. Unlike SECINFO, SECINFO_NO_NAME only | ||||
| works with objects that are accessed by filehandle. | ||||
| </t> | ||||
| <t> | ||||
| There are two styles of SECINFO_NO_NAME, as determined by the | ||||
| value of the secinfo_style4 enumeration. If SECINFO_STYLE4_CURRENT_FH is | ||||
| passed, then SECINFO_NO_NAME is querying for the required | ||||
| security for the current filehandle. If SECINFO_STYLE4_PARENT is passed, then | ||||
| SECINFO_NO_NAME is querying for the required security of the | ||||
| current filehandle's parent. If the style selected is SECINFO_STYLE4_PARENT, | ||||
| then SECINFO should apply the same access methodology used for | ||||
| LOOKUPP when evaluating the traversal to the parent directory. | ||||
| Therefore, if the requester does not have the appropriate access | ||||
| to LOOKUPP the parent, then SECINFO_NO_NAME must behave the same | ||||
| way and return NFS4ERR_ACCESS. | ||||
| </t> | ||||
| <t> | ||||
| If PUTFH, PUTPUBFH, PUTROOTFH, or RESTOREFH returns | ||||
| NFS4ERR_WRONGSEC, then the client resolves the | ||||
| situation by sending a COMPOUND request that consists of | ||||
| PUTFH, PUTPUBFH, or PUTROOTFH immediately followed by | ||||
| SECINFO_NO_NAME, style SECINFO_STYLE4_CURRENT_FH. | ||||
| See <xref target="Security_Service_Negotiation" format="default"/> | ||||
| for instructions on dealing with NFS4ERR_WRONGSEC error | ||||
| returns from PUTFH, PUTROOTFH, PUTPUBFH, or RESTOREFH. | ||||
| </t> | ||||
| <t> | ||||
| If SECINFO_STYLE4_PARENT is specified and there is no parent | ||||
| directory, SECINFO_NO_NAME <bcp14>MUST</bcp14> return NFS4ERR_NOENT. | ||||
| </t> | ||||
| <t> | ||||
| On success, the current filehandle is consumed | ||||
| (see <xref target="aftersecinfo" format="default"/>), and if the | ||||
| next operation after SECINFO_NO_NAME tries to use | ||||
| the current filehandle, that operation will fail | ||||
| with the status NFS4ERR_NOFILEHANDLE. | ||||
| </t> | ||||
| <t> | ||||
| Everything else about SECINFO_NO_NAME is the same as SECINFO. | ||||
| See the discussion on SECINFO (<xref target="OP_SECINFO_DESCRIPTION" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SECINFO_NO_NAME_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| See the discussion on SECINFO (<xref target="OP_SECINFO_IMPLEMENTATION" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_SEQUENCE" numbered="true" toc="default"> | ||||
| <name>Operation 53: SEQUENCE - Supply Per-Procedure Sequencing and Control</name> | ||||
| <section toc="exclude" anchor="OP_SEQUENCE_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct SEQUENCE4args { | ||||
| sessionid4 sa_sessionid; | ||||
| sequenceid4 sa_sequenceid; | ||||
| slotid4 sa_slotid; | ||||
| slotid4 sa_highest_slotid; | ||||
| bool sa_cachethis; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SEQUENCE_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const SEQ4_STATUS_CB_PATH_DOWN = 0x00000001; | ||||
| const SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING = 0x00000002; | ||||
| const SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED = 0x00000004; | ||||
| const SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED = 0x00000008; | ||||
| const SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED = 0x00000010; | ||||
| const SEQ4_STATUS_ADMIN_STATE_REVOKED = 0x00000020; | ||||
| const SEQ4_STATUS_RECALLABLE_STATE_REVOKED = 0x00000040; | ||||
| const SEQ4_STATUS_LEASE_MOVED = 0x00000080; | ||||
| const SEQ4_STATUS_RESTART_RECLAIM_NEEDED = 0x00000100; | ||||
| const SEQ4_STATUS_CB_PATH_DOWN_SESSION = 0x00000200; | ||||
| const SEQ4_STATUS_BACKCHANNEL_FAULT = 0x00000400; | ||||
| const SEQ4_STATUS_DEVID_CHANGED = 0x00000800; | ||||
| const SEQ4_STATUS_DEVID_DELETED = 0x00001000; | ||||
| struct SEQUENCE4resok { | ||||
| sessionid4 sr_sessionid; | ||||
| sequenceid4 sr_sequenceid; | ||||
| slotid4 sr_slotid; | ||||
| slotid4 sr_highest_slotid; | ||||
| slotid4 sr_target_highest_slotid; | ||||
| uint32_t sr_status_flags; | ||||
| }; | ||||
| union SEQUENCE4res switch (nfsstat4 sr_status) { | ||||
| case NFS4_OK: | ||||
| SEQUENCE4resok sr_resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SEQUENCE_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The SEQUENCE operation is | ||||
| used by the server to implement session request control | ||||
| and the reply cache semantics. | ||||
| </t> | ||||
| <t> | ||||
| SEQUENCE <bcp14>MUST</bcp14> appear as the first operation of any COMPOUND | ||||
| in which it appears. The error NFS4ERR_SEQUENCE_POS will be | ||||
| returned when it is found in any position in a COMPOUND | ||||
| beyond the first. Operations other than SEQUENCE, BIND_CONN_TO_SESSION, | ||||
| EXCHANGE_ID, CREATE_SESSION, and DESTROY_SESSION, | ||||
| <bcp14>MUST NOT</bcp14> appear as the first operation in a | ||||
| COMPOUND. Such operations <bcp14>MUST</bcp14> yield the error NFS4ERR_OP_NOT_IN_SESSION | ||||
| if they do appear at the start of a COMPOUND. | ||||
| </t> | ||||
| <t> | ||||
| If SEQUENCE is received on a connection not associated with the | ||||
| session via CREATE_SESSION or BIND_CONN_TO_SESSION, and | ||||
| connection association enforcement is enabled | ||||
| (see <xref target="OP_EXCHANGE_ID" format="default"/>), then | ||||
| the server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION. | ||||
| </t> | ||||
| <t> | ||||
| The sa_sessionid argument identifies the session to which this | ||||
| request applies. The sr_sessionid result <bcp14>MUST</bcp14> equal | ||||
| sa_sessionid. | ||||
| </t> | ||||
| <t> | ||||
| The sa_slotid argument is the index in the reply cache | ||||
| for the request. The sa_sequenceid field is the sequence | ||||
| number of the request for the reply cache entry (slot). | ||||
| The sr_slotid result <bcp14>MUST</bcp14> equal sa_slotid. The sr_sequenceid | ||||
| result <bcp14>MUST</bcp14> equal sa_sequenceid. | ||||
| </t> | ||||
| <t> | ||||
| The sa_highest_slotid argument is the highest slot ID | ||||
| for which the client has a request outstanding; it could be | ||||
| equal to sa_slotid. | ||||
| The server returns two "highest_slotid" values: sr_highest_slotid | ||||
| and sr_target_highest_slotid. The former is the highest slot ID | ||||
| the server will accept in future SEQUENCE operation, and | ||||
| <bcp14>SHOULD NOT</bcp14> be less than the value of sa_highest_slotid | ||||
| (but see | ||||
| <xref target="Slot_Identifiers_and_Server_Reply_Cache" format="default"/> | ||||
| for an exception). | ||||
| The latter is the highest slot ID the server would prefer the | ||||
| client use on a future SEQUENCE operation. | ||||
| </t> | ||||
| <t> | ||||
| If sa_cachethis is TRUE, then the client is requesting that | ||||
| the server cache the entire | ||||
| reply in the server's reply cache; therefore, the server <bcp14>MUST</bcp14> | ||||
| cache the reply (see <xref target="optional_reply_caching" format="default"/>). | ||||
| The server <bcp14>MAY</bcp14> cache the reply if sa_cachethis is FALSE. | ||||
| If the server does not cache the entire reply, it | ||||
| <bcp14>MUST</bcp14> still record that it executed the request at | ||||
| the specified slot and sequence ID. | ||||
| </t> | ||||
| <t> | ||||
| The response to the SEQUENCE operation contains a | ||||
| word of status flags (sr_status_flags) that can | ||||
| provide to the client information related to the | ||||
| status of the client's lock state and communications | ||||
| paths. Note that any status bits relating to lock | ||||
| state <bcp14>MAY</bcp14> be reset when lock state is lost due to a | ||||
| server restart (even if the session is persistent across | ||||
| restarts; session persistence does not imply | ||||
| lock state persistence) | ||||
| or the establishment of a new client | ||||
| instance. | ||||
| </t> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>SEQ4_STATUS_CB_PATH_DOWN</dt> | ||||
| <dd> | ||||
| When set, indicates that the client has no | ||||
| operational backchannel path for any session | ||||
| associated with the client ID, making it | ||||
| necessary for the client to re-establish one. | ||||
| This bit | ||||
| remains set on all SEQUENCE responses on all sessions | ||||
| associated with the client ID | ||||
| until at least one backchannel is | ||||
| available on any session associated with the client ID. | ||||
| If the client fails to re-establish a | ||||
| backchannel for the client ID, it is subject to | ||||
| having recallable state revoked. | ||||
| </dd> | ||||
| <dt>SEQ4_STATUS_CB_PATH_DOWN_SESSION</dt> | ||||
| <dd> | ||||
| When set, indicates that the session has | ||||
| no operational backchannel. There are two reasons | ||||
| why SEQ4_STATUS_CB_PATH_DOWN_SESSION may be set and not | ||||
| SEQ4_STATUS_CB_PATH_DOWN. First is that a callback operation | ||||
| that applies specifically to the | ||||
| session (e.g., CB_RECALL_SLOT, see <xref target="OP_CB_RECALL_SLOT" format="default"/>) needs to be sent. | ||||
| Second is that the server did send a callback operation, | ||||
| but the connection was lost before the reply. The | ||||
| server cannot be sure whether or not the client received the | ||||
| callback operation, and so, per rules on | ||||
| request retry, the server <bcp14>MUST</bcp14> retry the callback | ||||
| operation over the same session. The | ||||
| SEQ4_STATUS_CB_PATH_DOWN_SESSION bit is the indication | ||||
| to the client that it needs to associate a connection | ||||
| to the session's backchannel. | ||||
| This bit remains set on all SEQUENCE responses of the | ||||
| session until a connection is associated with the | ||||
| session's a backchannel. | ||||
| If the client fails to re-establish a | ||||
| backchannel for the session, it is subject to | ||||
| having recallable state revoked. | ||||
| </dd> | ||||
| <dt>SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING</dt> | ||||
| <dd> | ||||
| <t> | ||||
| When set, indicates that all GSS contexts or RPCSEC_GSS handles | ||||
| assigned to the session's backchannel will expire within a | ||||
| period equal to the lease time. This bit remains set on all | ||||
| SEQUENCE replies until at least one of the following are true: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| All SSV RPCSEC_GSS handles on the session's backchannel | ||||
| have been destroyed and all non-SSV GSS contexts have expired. | ||||
| </li> | ||||
| <li> | ||||
| At least one more SSV RPCSEC_GSS handle has been added to | ||||
| the backchannel. | ||||
| </li> | ||||
| <li> | ||||
| The expiration time of at least one non-SSV GSS context | ||||
| of an RPCSEC_GSS handle | ||||
| is beyond the lease period from the current | ||||
| time (relative to the time of when a SEQUENCE | ||||
| response was sent) | ||||
| </li> | ||||
| </ul> | ||||
| </dd> | ||||
| <dt>SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED</dt> | ||||
| <dd> | ||||
| When set, indicates all non-SSV GSS contexts and all | ||||
| SSV RPCSEC_GSS handles assigned | ||||
| to the session's backchannel have expired or have been | ||||
| destroyed. | ||||
| This bit remains set on all SEQUENCE replies | ||||
| until at least one non-expired non-SSV GSS context for the | ||||
| session's backchannel has been established or at least one | ||||
| SSV RPCSEC_GSS handle has been assigned to the backchannel. | ||||
| </dd> | ||||
| <dt>SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED</dt> | ||||
| <dd> | ||||
| When set, indicates that the lease has expired | ||||
| and as a result the server released all of the | ||||
| client's locking state. This status bit remains | ||||
| set on all SEQUENCE replies until the loss of | ||||
| all such locks has been acknowledged by use of | ||||
| FREE_STATEID (see <xref target="OP_FREE_STATEID" format="default"/>), or by establishing a new client instance by | ||||
| destroying all sessions (via DESTROY_SESSION), | ||||
| the client ID (via DESTROY_CLIENTID), and then | ||||
| invoking EXCHANGE_ID and CREATE_SESSION to | ||||
| establish a new client ID. | ||||
| </dd> | ||||
| <dt>SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED</dt> | ||||
| <dd> | ||||
| When set, indicates that some subset of the client's locks | ||||
| have been revoked due to expiration of the lease period | ||||
| followed by another client's conflicting LOCK operation. | ||||
| This status bit remains set on all SEQUENCE replies | ||||
| until the loss of all | ||||
| such locks has been acknowledged by use of FREE_STATEID. | ||||
| </dd> | ||||
| <dt>SEQ4_STATUS_ADMIN_STATE_REVOKED</dt> | ||||
| <dd> | ||||
| When set, indicates that one or more locks have been revoked | ||||
| without expiration of the lease period, due to administrative | ||||
| action. This status bit remains set on all SEQUENCE replies | ||||
| until the loss of all | ||||
| such locks has been acknowledged by use of FREE_STATEID. | ||||
| </dd> | ||||
| <dt>SEQ4_STATUS_RECALLABLE_STATE_REVOKED</dt> | ||||
| <dd> | ||||
| When set, indicates that one or more recallable | ||||
| objects have been revoked without expiration | ||||
| of the lease period, due to the client's | ||||
| failure to return them when recalled, which | ||||
| may be a consequence of there being no working | ||||
| backchannel and the client failing to re-establish | ||||
| a backchannel per the SEQ4_STATUS_CB_PATH_DOWN, | ||||
| SEQ4_STATUS_CB_PATH_DOWN_SESSION, or | ||||
| SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED status flags. | ||||
| This status bit remains set on all SEQUENCE | ||||
| replies until the loss of all such locks has | ||||
| been acknowledged by use of FREE_STATEID. | ||||
| </dd> | ||||
| <dt>SEQ4_STATUS_LEASE_MOVED</dt> | ||||
| <dd> | ||||
| When set, indicates that responsibility for lease renewal has | ||||
| been transferred to one or more new servers. This condition | ||||
| will continue until the client receives an NFS4ERR_MOVED | ||||
| error and the server receives the subsequent GETATTR for the | ||||
| fs_locations or fs_locations_info attribute for an access to | ||||
| each file system for which a lease has been moved to a new | ||||
| server. See <xref target="transferred_lease" format="default"/>. | ||||
| </dd> | ||||
| <dt>SEQ4_STATUS_RESTART_RECLAIM_NEEDED</dt> | ||||
| <dd> | ||||
| When set, indicates that due to server | ||||
| restart, the client must reclaim locking state. | ||||
| Until the client sends a global RECLAIM_COMPLETE | ||||
| (<xref target="OP_RECLAIM_COMPLETE" format="default"/>), every | ||||
| SEQUENCE operation will return | ||||
| SEQ4_STATUS_RESTART_RECLAIM_NEEDED. | ||||
| </dd> | ||||
| <dt>SEQ4_STATUS_BACKCHANNEL_FAULT</dt> | ||||
| <dd> | ||||
| The server has encountered an unrecoverable fault | ||||
| with the backchannel (e.g., it has lost track of the | ||||
| sequence ID for a slot in the backchannel). The | ||||
| client <bcp14>MUST</bcp14> stop sending more requests on the | ||||
| session's fore channel, wait for all outstanding requests to | ||||
| complete on the fore and back channel, and then | ||||
| destroy the session. | ||||
| </dd> | ||||
| <dt>SEQ4_STATUS_DEVID_CHANGED</dt> | ||||
| <dd> | ||||
| The client is using device ID notifications and the server | ||||
| has changed a device ID mapping held by the client. This | ||||
| flag will stay present until the client has obtained the new | ||||
| mapping with GETDEVICEINFO. | ||||
| </dd> | ||||
| <dt>SEQ4_STATUS_DEVID_DELETED</dt> | ||||
| <dd> | ||||
| The client is using device ID notifications and the server | ||||
| has deleted a device ID mapping held by the client. | ||||
| This flag will stay in effect until the client sends a GETDEVICEINFO | ||||
| on the device ID with a null value in the argument gdia_notify_types. | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| The value of the sa_sequenceid argument relative to | ||||
| the cached sequence ID on the slot falls into one | ||||
| of three cases. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the difference between sa_sequenceid and | ||||
| the server's cached sequence ID at the slot ID | ||||
| is two (2) or more, | ||||
| or if sa_sequenceid is less | ||||
| than the cached sequence ID (accounting | ||||
| for wraparound of the unsigned sequence ID value), | ||||
| then the server <bcp14>MUST</bcp14> return NFS4ERR_SEQ_MISORDERED. | ||||
| </li> | ||||
| <li> | ||||
| If sa_sequenceid and the cached sequence ID are | ||||
| the same, this is a retry, and the server replies | ||||
| with what is recorded in the reply | ||||
| cache. | ||||
| The lease is possibly renewed as described below. | ||||
| </li> | ||||
| <li> | ||||
| If sa_sequenceid is one greater (accounting for | ||||
| wraparound) than the cached sequence ID, then | ||||
| this is a new request, and the slot's sequence | ||||
| ID is incremented. The operations subsequent to | ||||
| SEQUENCE, if any, are processed. If there are no | ||||
| other operations, the only other effects are to | ||||
| cache the SEQUENCE reply in the slot, maintain the | ||||
| session's activity, and possibly renew the lease. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If the client reuses a slot ID and sequence ID for | ||||
| a completely different request, the server <bcp14>MAY</bcp14> treat | ||||
| the request as if it is a retry of what it has already | ||||
| executed. The server <bcp14>MAY</bcp14> however detect the client's | ||||
| illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. | ||||
| </t> | ||||
| <t> | ||||
| If SEQUENCE returns an error, then the state of the | ||||
| slot (sequence ID, cached reply) <bcp14>MUST NOT</bcp14> change, | ||||
| and the associated lease <bcp14>MUST NOT</bcp14> be renewed. | ||||
| </t> | ||||
| <t> | ||||
| If SEQUENCE returns NFS4_OK, then the associated | ||||
| lease <bcp14>MUST</bcp14> be renewed (see <xref target="lease_renewal" format="default"/>), | ||||
| except if SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is | ||||
| returned in sr_status_flags. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SEQUENCE_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The server <bcp14>MUST</bcp14> maintain a mapping of session ID to client ID | ||||
| in order to validate any operations that follow SEQUENCE | ||||
| that take a stateid as an argument and/or result. | ||||
| </t> | ||||
| <t> | ||||
| If the client establishes a persistent session, then | ||||
| a SEQUENCE received after a server restart might encounter | ||||
| requests performed and recorded in a persistent reply | ||||
| cache before the server restart. In this case, SEQUENCE | ||||
| will be processed successfully, while requests that | ||||
| were not previously performed and recorded are rejected with | ||||
| NFS4ERR_DEADSESSION. | ||||
| </t> | ||||
| <t> | ||||
| Depending on which of the operations within the COMPOUND were | ||||
| successfully | ||||
| performed before the server restart, these operations will | ||||
| also have replies sent from the server reply cache. | ||||
| Note that when these operations establish locking state, it | ||||
| is locking state that applies to the previous server instance | ||||
| and to the previous client ID, even though the | ||||
| server restart, which logically happened after these | ||||
| operations, eliminated that state. In the | ||||
| case of a partially executed COMPOUND, processing may reach | ||||
| an operation not processed during the earlier server instance, | ||||
| making this operation a new one and not performable on the | ||||
| existing session. In this case, NFS4ERR_DEADSESSION will be | ||||
| returned from that operation. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_SET_SSV" numbered="true" toc="default"> | ||||
| <name>Operation 54: SET_SSV - Update SSV for a Client ID</name> | ||||
| <section toc="exclude" anchor="OP_SET_SSV_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct ssa_digest_input4 { | ||||
| SEQUENCE4args sdi_seqargs; | ||||
| }; | ||||
| struct SET_SSV4args { | ||||
| opaque ssa_ssv<>; | ||||
| opaque ssa_digest<>; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SET_SSV_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct ssr_digest_input4 { | ||||
| SEQUENCE4res sdi_seqres; | ||||
| }; | ||||
| struct SET_SSV4resok { | ||||
| opaque ssr_digest<>; | ||||
| }; | ||||
| union SET_SSV4res switch (nfsstat4 ssr_status) { | ||||
| case NFS4_OK: | ||||
| SET_SSV4resok ssr_resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SET_SSV_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation is used to update the | ||||
| SSV for a client ID. Before SET_SSV is called the | ||||
| first time on a client ID, the SSV is zero. | ||||
| The SSV is the key used for the SSV GSS mechanism | ||||
| (<xref target="ssv_mech" format="default"/>) | ||||
| </t> | ||||
| <t> | ||||
| SET_SSV <bcp14>MUST</bcp14> be preceded by a | ||||
| SEQUENCE operation in the same COMPOUND. | ||||
| It <bcp14>MUST NOT</bcp14> be used if the client | ||||
| did not opt for SP4_SSV state protection when the | ||||
| client ID was created | ||||
| (see <xref target="OP_EXCHANGE_ID" format="default"/>); | ||||
| the server returns NFS4ERR_INVAL in that case. | ||||
| </t> | ||||
| <t> | ||||
| The field ssa_digest is computed as the output of | ||||
| the HMAC (<xref target="RFC2104" format="default">RFC 2104</xref>) using the subkey derived | ||||
| from the SSV4_SUBKEY_MIC_I2T and current SSV | ||||
| as the key (see <xref target="ssv_mech" format="default"/> for a | ||||
| description of subkeys), and an XDR encoded value of data type ssa_digest_input4. | ||||
| The field sdi_seqargs is equal to the | ||||
| arguments of the SEQUENCE operation | ||||
| for the COMPOUND procedure that | ||||
| SET_SSV is within. | ||||
| </t> | ||||
| <t> | ||||
| The argument ssa_ssv | ||||
| is XORed with the current SSV to produce | ||||
| the new SSV. The argument ssa_ssv <bcp14>SHOULD</bcp14> be generated randomly. | ||||
| </t> | ||||
| <t> | ||||
| In the response, ssr_digest is the output of the HMAC using the | ||||
| subkey derived from SSV4_SUBKEY_MIC_T2I and new SSV as the key, | ||||
| and an XDR encoded value of data type ssr_digest_input4. The | ||||
| field sdi_seqres is equal to the results of the SEQUENCE | ||||
| operation for the COMPOUND procedure that SET_SSV is within. | ||||
| </t> | ||||
| <t> | ||||
| As noted in <xref target="OP_EXCHANGE_ID" format="default"/>, the client and | ||||
| server can maintain multiple concurrent versions of the SSV. | ||||
| The client and server each <bcp14>MUST</bcp14> maintain an internal | ||||
| SSV version number, which is set to one the first time | ||||
| SET_SSV executes on the server and the client | ||||
| receives the first SET_SSV reply. Each subsequent | ||||
| SET_SSV increases the internal SSV version number by one. The | ||||
| value of this version number corresponds to the smpt_ssv_seq, | ||||
| smt_ssv_seq, sspt_ssv_seq, and ssct_ssv_seq fields of the | ||||
| SSV GSS mechanism tokens (see <xref target="ssv_mech" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_SET_SSV_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| When the server receives ssa_digest, it <bcp14>MUST</bcp14> verify the digest | ||||
| by computing the digest the same way the client did and | ||||
| comparing it with ssa_digest. If the server gets a different | ||||
| result, this is an error, NFS4ERR_BAD_SESSION_DIGEST. | ||||
| This error might be the result of another SET_SSV from the | ||||
| same client ID changing the SSV. If so, the client recovers | ||||
| by sending a SET_SSV operation again with a recomputed digest based on | ||||
| the subkey of the new SSV. If the transport connection is dropped after | ||||
| the SET_SSV request is sent, but before the | ||||
| SET_SSV reply is received, then there are special considerations | ||||
| for recovery if the client has no more connections associated | ||||
| with sessions associated with the client ID of the SSV. See | ||||
| <xref target="OP_BIND_CONN_TO_SESSION_IMPLEMENTATION" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| Clients <bcp14>SHOULD NOT</bcp14> send an ssa_ssv that is equal to a previous | ||||
| ssa_ssv, nor equal to a previous or current SSV (including an ssa_ssv equal to zero | ||||
| since the SSV is initialized to zero when the client ID is created). | ||||
| </t> | ||||
| <t> | ||||
| Clients <bcp14>SHOULD</bcp14> send SET_SSV with RPCSEC_GSS privacy. Servers | ||||
| <bcp14>MUST</bcp14> support RPCSEC_GSS with privacy for any COMPOUND that has { | ||||
| SEQUENCE, SET_SSV }. | ||||
| </t> | ||||
| <t> | ||||
| A client <bcp14>SHOULD NOT</bcp14> send SET_SSV with the SSV GSS | ||||
| mechanism's credential because the purpose of SET_SSV | ||||
| is to seed the SSV from non-SSV credentials. Instead, | ||||
| SET_SSV <bcp14>SHOULD</bcp14> be sent with the credential of | ||||
| a user that is accessing the client ID for the | ||||
| first time | ||||
| (<xref target="protect_state_change" format="default"/>). | ||||
| However, if the client does send SET_SSV with SSV | ||||
| credentials, the digest protecting the arguments | ||||
| uses the value of the SSV before ssa_ssv is XORed in, | ||||
| and the digest protecting the results uses the value | ||||
| of the SSV after the ssa_ssv is XORed in. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_TEST_STATEID" numbered="true" toc="default"> | ||||
| <name>Operation 55: TEST_STATEID - Test Stateids for Validity</name> | ||||
| <section toc="exclude" anchor="OP_TEST_STATEID_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct TEST_STATEID4args { | ||||
| stateid4 ts_stateids<>; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_TEST_STATEID_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct TEST_STATEID4resok { | ||||
| nfsstat4 tsr_status_codes<>; | ||||
| }; | ||||
| union TEST_STATEID4res switch (nfsstat4 tsr_status) { | ||||
| case NFS4_OK: | ||||
| TEST_STATEID4resok tsr_resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_TEST_STATEID4_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The TEST_STATEID operation is used to check the validity of | ||||
| a set of stateids. It can be used at any time, but the client | ||||
| should definitely use it when it | ||||
| receives an indication that one or more of its stateids have been | ||||
| invalidated due to lock revocation. This occurs when the SEQUENCE | ||||
| operation returns with one of the following sr_status_flags set: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED | ||||
| </li> | ||||
| <li> | ||||
| SEQ4_STATUS_EXPIRED_ADMIN_STATE_REVOKED | ||||
| </li> | ||||
| <li> | ||||
| SEQ4_STATUS_EXPIRED_RECALLABLE_STATE_REVOKED | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The client can use TEST_STATEID one or more times to test the | ||||
| validity of its stateids. Each use of TEST_STATEID allows a large | ||||
| set of such stateids to be tested and avoids problems with earlier | ||||
| stateids in a COMPOUND request from interfering with the checking of | ||||
| subsequent stateids, as would happen if individual stateids were | ||||
| tested by a series of corresponding by operations in a COMPOUND | ||||
| request. | ||||
| </t> | ||||
| <t> | ||||
| For each stateid, the server returns the status code that | ||||
| would be returned if that stateid were to be used in normal | ||||
| operation. Returning such a status indication is not an | ||||
| error and does not cause COMPOUND processing to terminate. Checks | ||||
| for the validity of the stateid proceed as they would for | ||||
| normal operations with a number of exceptions: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| There is no check for the type of stateid object, as would be | ||||
| the case for normal use of a stateid. | ||||
| </li> | ||||
| <li> | ||||
| There is no reference to the current filehandle. | ||||
| </li> | ||||
| <li> | ||||
| Special stateids are always considered invalid (they result | ||||
| in the error code NFS4ERR_BAD_STATEID). | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| All stateids are interpreted as being associated with the client | ||||
| for the current session. Any possible association with a previous | ||||
| instance of the client (as stale stateids) is not considered. | ||||
| </t> | ||||
| <t> | ||||
| The valid status values in the returned status_code array | ||||
| are NFS4ERR_OK, NFS4ERR_BAD_STATEID, NFS4ERR_OLD_STATEID, | ||||
| NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, and NFS4ERR_DELEG_REVOKED. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_TEST_STATEID_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| See Sections <xref target="stateid_structure" format="counter"/> and | ||||
| <xref target="stateid_lifetime" format="counter"/> | ||||
| for a discussion of stateid structure, lifetime, and validation. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_WANT_DELEGATION" numbered="true" toc="default"> | ||||
| <name>Operation 56: WANT_DELEGATION - Request Delegation</name> | ||||
| <section toc="exclude" anchor="OP_WANT_DELEGATION_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| union deleg_claim4 switch (open_claim_type4 dc_claim) { | ||||
| /* | ||||
| * No special rights to object. Ordinary delegation | ||||
| * request of the specified object. Object identified | ||||
| * by filehandle. | ||||
| */ | ||||
| case CLAIM_FH: /* new to v4.1 */ | ||||
| /* CURRENT_FH: object being delegated */ | ||||
| void; | ||||
| /* | ||||
| * Right to file based on a delegation granted | ||||
| * to a previous boot instance of the client. | ||||
| * File is specified by filehandle. | ||||
| */ | ||||
| case CLAIM_DELEG_PREV_FH: /* new to v4.1 */ | ||||
| /* CURRENT_FH: object being delegated */ | ||||
| void; | ||||
| /* | ||||
| * Right to the file established by an open previous | ||||
| * to server reboot. File identified by filehandle. | ||||
| * Used during server reclaim grace period. | ||||
| */ | ||||
| case CLAIM_PREVIOUS: | ||||
| /* CURRENT_FH: object being reclaimed */ | ||||
| open_delegation_type4 dc_delegate_type; | ||||
| }; | ||||
| struct WANT_DELEGATION4args { | ||||
| uint32_t wda_want; | ||||
| deleg_claim4 wda_claim; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_WANT_DELEGATION_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| union WANT_DELEGATION4res switch (nfsstat4 wdr_status) { | ||||
| case NFS4_OK: | ||||
| open_delegation4 wdr_resok4; | ||||
| default: | ||||
| void; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_WANT_DELEGATION_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| Where this description mandates the return of a specific error | ||||
| code for a specific condition, and where multiple conditions | ||||
| apply, the server <bcp14>MAY</bcp14> return any of the mandated error codes. | ||||
| </t> | ||||
| <t> | ||||
| This operation allows a client to: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Get a delegation on all types | ||||
| of files except directories. | ||||
| </li> | ||||
| <li> | ||||
| Register a "want" for a delegation for the | ||||
| specified file object, and be notified via a | ||||
| callback when the delegation is available. The | ||||
| server <bcp14>MAY</bcp14> support notifications of availability | ||||
| via callbacks. If the server does not support | ||||
| registration of wants, it <bcp14>MUST NOT</bcp14> return | ||||
| an error to indicate that, and instead <bcp14>MUST</bcp14> | ||||
| return with ond_why set to WND4_CONTENTION or | ||||
| WND4_RESOURCE and ond_server_will_push_deleg or | ||||
| ond_server_will_signal_avail set to FALSE. When the | ||||
| server indicates that it will notify the client | ||||
| by means of a callback, it will either provide | ||||
| the delegation using a CB_PUSH_DELEG operation or | ||||
| cancel its promise by sending a CB_WANTS_CANCELLED | ||||
| operation. | ||||
| </li> | ||||
| <li> | ||||
| Cancel a want for a delegation. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The client <bcp14>SHOULD NOT</bcp14> set OPEN4_SHARE_ACCESS_READ and <bcp14>SHOULD NOT</bcp14> | ||||
| set OPEN4_SHARE_ACCESS_WRITE in wda_want. If it does, the server | ||||
| <bcp14>MUST</bcp14> ignore them. | ||||
| </t> | ||||
| <t> | ||||
| The meanings of the following flags in wda_want are the same as | ||||
| they are in OPEN, except as noted below. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| OPEN4_SHARE_ACCESS_WANT_READ_DELEG | ||||
| </li> | ||||
| <li> | ||||
| OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG | ||||
| </li> | ||||
| <li> | ||||
| OPEN4_SHARE_ACCESS_WANT_ANY_DELEG | ||||
| </li> | ||||
| <li> | ||||
| OPEN4_SHARE_ACCESS_WANT_NO_DELEG. Unlike the OPEN operation, | ||||
| this flag <bcp14>SHOULD NOT</bcp14> be set by the client in the arguments to | ||||
| WANT_DELEGATION, and <bcp14>MUST</bcp14> be ignored by the server. | ||||
| </li> | ||||
| <li> | ||||
| OPEN4_SHARE_ACCESS_WANT_CANCEL | ||||
| </li> | ||||
| <li> | ||||
| OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL | ||||
| </li> | ||||
| <li> | ||||
| OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The handling of the above flags in WANT_DELEGATION is the same | ||||
| as in OPEN. Information about the delegation and/or the | ||||
| promises the server is making regarding future callbacks are | ||||
| the same as those described in the open_delegation4 structure. | ||||
| </t> | ||||
| <t> | ||||
| The successful results of WANT_DELEGATION are of data type | ||||
| open_delegation4, which is the same data type as the "delegation" | ||||
| field in the results of the OPEN operation | ||||
| (see <xref target="OP_OPEN_DESCRIPTION" format="default"/>). | ||||
| The server constructs wdr_resok4 the same way it constructs | ||||
| OPEN's "delegation" with one difference: | ||||
| WANT_DELEGATION <bcp14>MUST NOT</bcp14> return a delegation type of | ||||
| OPEN_DELEGATE_NONE. | ||||
| </t> | ||||
| <t> | ||||
| If ((wda_want & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) & | ||||
| ~OPEN4_SHARE_ACCESS_WANT_NO_DELEG) is zero, | ||||
| then the client is indicating no | ||||
| explicit desire or non-desire for a delegation and the server <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_INVAL. | ||||
| </t> | ||||
| <t> | ||||
| The client uses the | ||||
| OPEN4_SHARE_ACCESS_WANT_CANCEL | ||||
| flag in the WANT_DELEGATION | ||||
| operation to cancel a previously requested want for a delegation. | ||||
| Note that if the server is in the process of sending the | ||||
| delegation (via CB_PUSH_DELEG) at the time the client sends | ||||
| a cancellation of the want, the delegation might still be pushed | ||||
| to the client. | ||||
| </t> | ||||
| <t> | ||||
| If WANT_DELEGATION fails to return a delegation, and | ||||
| the server returns NFS4_OK, the server <bcp14>MUST</bcp14> set the | ||||
| delegation type to OPEN4_DELEGATE_NONE_EXT, and set | ||||
| od_whynone, as described in <xref target="OP_OPEN" format="default"/>. Write delegations are not available for | ||||
| file types that are not writable. This includes | ||||
| file objects of types NF4BLK, NF4CHR, NF4LNK, | ||||
| NF4SOCK, and NF4FIFO. If the client requests | ||||
| OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG without | ||||
| OPEN4_SHARE_ACCESS_WANT_READ_DELEG on an object with | ||||
| one of the aforementioned file types, the server must | ||||
| set wdr_resok4.od_whynone.ond_why to | ||||
| WND4_WRITE_DELEG_NOT_SUPP_FTYPE. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_WANT_DELEGATION_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| A request for a conflicting delegation is not normally intended to trigger | ||||
| the recall of the existing delegation. Servers may choose to treat | ||||
| some clients as having higher priority such that their wants will | ||||
| trigger recall of an existing delegation, although that is expected | ||||
| to be an unusual situation. | ||||
| </t> | ||||
| <t> | ||||
| Servers will generally recall delegations assigned by WANT_DELEGATION | ||||
| on the same basis as those assigned by OPEN. CB_RECALL will generally | ||||
| be done only when other clients perform operations inconsistent with | ||||
| the delegation. The normal response to aging of delegations is to use | ||||
| CB_RECALL_ANY, in order to give the client the opportunity to keep | ||||
| the delegations most useful from its point of view. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_DESTROY_CLIENTID" numbered="true" toc="default"> | ||||
| <name>Operation 57: DESTROY_CLIENTID - Destroy a Client ID</name> | ||||
| <section toc="exclude" anchor="OP_DESTROY_CLIENTID_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct DESTROY_CLIENTID4args { | ||||
| clientid4 dca_clientid; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_DESTROY_CLIENTID_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct DESTROY_CLIENTID4res { | ||||
| nfsstat4 dcr_status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_DESTROY_CLIENTID_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The DESTROY_CLIENTID operation destroys the | ||||
| client ID. If there are sessions (both idle and | ||||
| non-idle), opens, locks, delegations, layouts, | ||||
| and/or wants (<xref target="OP_WANT_DELEGATION" format="default"/>) | ||||
| associated with the unexpired lease of the client | ||||
| ID, the server <bcp14>MUST</bcp14> return NFS4ERR_CLIENTID_BUSY. | ||||
| DESTROY_CLIENTID <bcp14>MAY</bcp14> be preceded with a SEQUENCE | ||||
| operation as long as the client ID derived from the | ||||
| session ID of SEQUENCE is not the same as the client | ||||
| ID to be destroyed. If the client IDs are the same, | ||||
| then the server <bcp14>MUST</bcp14> return NFS4ERR_CLIENTID_BUSY. | ||||
| </t> | ||||
| <t> | ||||
| If DESTROY_CLIENTID is not prefixed by SEQUENCE, | ||||
| it <bcp14>MUST</bcp14> be the only operation in the COMPOUND | ||||
| request (otherwise, the server <bcp14>MUST</bcp14> return | ||||
| NFS4ERR_NOT_ONLY_OP). If the operation is sent | ||||
| without a SEQUENCE preceding it, a client that | ||||
| retransmits the request may receive an error in | ||||
| response, because the original request might have | ||||
| been successfully executed. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_DESTROY_CLIENTID_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| DESTROY_CLIENTID allows a server to immediately | ||||
| reclaim the resources consumed by an unused client | ||||
| ID, and also to forget that it ever generated the | ||||
| client ID. By forgetting that it ever generated the client | ||||
| ID, the server can safely reuse the client ID on a | ||||
| future EXCHANGE_ID operation. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_RECLAIM_COMPLETE" numbered="true" toc="default"> | ||||
| <name>Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished</name> | ||||
| <section toc="exclude" anchor="OP_RECLAIM_COMPLETE_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr" markers="true"><![CDATA[ | ||||
| struct RECLAIM_COMPLETE4args { | ||||
| /* | ||||
| * If rca_one_fs TRUE, | ||||
| * | ||||
| * CURRENT_FH: object in | ||||
| * file system reclaim is | ||||
| * complete for. | ||||
| */ | ||||
| bool rca_one_fs; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_RECLAIM_COMPLETE_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr" markers="true"><![CDATA[ | ||||
| struct RECLAIM_COMPLETE4res { | ||||
| nfsstat4 rcr_status; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_RECLAIM_COMPLETE_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| A RECLAIM_COMPLETE operation is used to indicate that the client | ||||
| has reclaimed all of the locking state that it will recover using | ||||
| reclaim, | ||||
| when it is recovering state due to either a server restart or the | ||||
| migration of a file system to another server. There are two types | ||||
| of RECLAIM_COMPLETE operations: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| When rca_one_fs is FALSE, a global RECLAIM_COMPLETE is being | ||||
| done. This indicates that recovery of all | ||||
| locks that the client held on the previous server instance | ||||
| has been completed. The current filehandle need not be set in | ||||
| this case. | ||||
| </li> | ||||
| <li> | ||||
| When rca_one_fs is TRUE, a file system-specific RECLAIM_COMPLETE | ||||
| is being done. This indicates that recovery of locks | ||||
| for a single fs (the one designated by the current filehandle) | ||||
| due to the migration of the file system has been completed. Presence | ||||
| of a current filehandle is required when rca_one_fs is set to TRUE. | ||||
| When the current filehandle designates a filehandle in a file system | ||||
| not in the process of migration, the operation returns NFS4_OK and | ||||
| is otherwise ignored. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Once a RECLAIM_COMPLETE is done, there can be no further | ||||
| reclaim operations for locks whose scope is defined as having | ||||
| completed recovery. Once the client sends RECLAIM_COMPLETE, | ||||
| the server will not allow the client to do | ||||
| subsequent reclaims of locking state for that scope | ||||
| and, if these are attempted, will return NFS4ERR_NO_GRACE. | ||||
| </t> | ||||
| <t> | ||||
| Whenever a client establishes a new client ID and before it does | ||||
| the first non-reclaim operation that obtains a lock, it <bcp14>MUST</bcp14> send a | ||||
| RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there | ||||
| are no locks to | ||||
| reclaim. If non-reclaim | ||||
| locking operations are done before the RECLAIM_COMPLETE, an NFS4ERR_GRACE | ||||
| error will be returned. | ||||
| </t> | ||||
| <t> | ||||
| Similarly, when the client accesses a migrated file system on a new | ||||
| server, before it sends the first non-reclaim operation that | ||||
| obtains a lock on this new server, it <bcp14>MUST</bcp14> send a RECLAIM_COMPLETE | ||||
| with rca_one_fs set to TRUE and current filehandle within that file system, | ||||
| even if there are no locks to reclaim. If non-reclaim locking | ||||
| operations are done on that file system before the | ||||
| RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. | ||||
| </t> | ||||
| <t> | ||||
| It should be noted that there are situations in which a client needs | ||||
| to issue both forms of RECLAIM_COMPLETE. An example is an instance | ||||
| of file system migration in which the file system is migrated to a | ||||
| server for which the client has no clientid. As a result, the client | ||||
| needs to obtain a clientid from the server (incurring the responsibility | ||||
| to do RECLAIM_COMPLETE with rca_one_fs set to FALSE) as well as | ||||
| RECLAIM_COMPLETE with rca_one_fs set to TRUE to complete the per-fs | ||||
| grace period associated with the file system migration. These two | ||||
| may be done in any order as long as all necessary lock reclaims | ||||
| have been done before | ||||
| issuing either of them. | ||||
| </t> | ||||
| <t> | ||||
| Any locks not reclaimed at the point at which RECLAIM_COMPLETE | ||||
| is done become non-reclaimable. The client <bcp14>MUST NOT</bcp14> attempt | ||||
| to reclaim them, either during | ||||
| the current server instance or in any subsequent | ||||
| server instance, or on another server to which responsibility | ||||
| for that file system is transferred. If the client were to do so, | ||||
| it would be | ||||
| violating the protocol by representing itself as owning locks | ||||
| that it does not own, and so has no right to reclaim. See | ||||
| <xref target="RFC5661" sectionFormat="of" section="8.4.3"/> for a | ||||
| discussion of edge conditions related to lock reclaim. | ||||
| </t> | ||||
| <t> | ||||
| By sending a RECLAIM_COMPLETE, the client indicates readiness | ||||
| to proceed to do normal non-reclaim locking operations. The client | ||||
| should be aware that such operations may temporarily result in | ||||
| NFS4ERR_GRACE errors until the server is ready to terminate its | ||||
| grace period. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_RECLAIM_COMPLETE_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| Servers will typically use the information as to when reclaim | ||||
| activity is complete to reduce the length of the grace period. | ||||
| When the server maintains in persistent storage | ||||
| a list of clients that might have had locks, | ||||
| it is able to use the fact that | ||||
| all such clients have done a RECLAIM_COMPLETE to terminate the | ||||
| grace period and begin normal operations (i.e., grant requests | ||||
| for new locks) sooner than it might otherwise. | ||||
| </t> | ||||
| <t> | ||||
| Latency can be minimized by doing a RECLAIM_COMPLETE as part of | ||||
| the COMPOUND request in which the last lock-reclaiming operation | ||||
| is done. When there are no reclaims to be done, RECLAIM_COMPLETE | ||||
| should be done immediately in order to allow the grace period | ||||
| to end as soon as possible. | ||||
| </t> | ||||
| <t> | ||||
| RECLAIM_COMPLETE should only be done once for each server instance | ||||
| or occasion of the transition of a file system. | ||||
| If it is done a second time, the error NFS4ERR_COMPLETE_ALREADY will | ||||
| result. Note that because of the session feature's retry protection, | ||||
| retries of COMPOUND | ||||
| requests containing RECLAIM_COMPLETE operation will not result | ||||
| in this error. | ||||
| </t> | ||||
| <t> | ||||
| When a RECLAIM_COMPLETE is sent, the client effectively acknowledges | ||||
| any locks not yet reclaimed as lost. This allows the server to | ||||
| re-enable the client to recover locks if the occurrence of edge | ||||
| conditions, as described in | ||||
| <xref target="network_partitions_and_recovery" format="default"/>, | ||||
| had caused the server to disable the client's ability to | ||||
| recover locks. | ||||
| </t> | ||||
| <t> | ||||
| Because previous descriptions of RECLAIM_COMPLETE were not | ||||
| sufficiently explicit about the circumstances in which use of | ||||
| RECLAIM_COMPLETE with rca_one_fs set to TRUE was appropriate, | ||||
| there have been cases in which it has been misused by clients who | ||||
| have issued RECLAIM_COMPLETE with rca_one_fs set to TRUE when it | ||||
| should have not been. There have also been | ||||
| cases in which servers have, in various ways, not responded to | ||||
| such misuse as described above, either ignoring the rca_one_fs | ||||
| setting (treating the operation as a global RECLAIM_COMPLETE) or | ||||
| ignoring the entire operation. | ||||
| </t> | ||||
| <t> | ||||
| While clients <bcp14>SHOULD NOT</bcp14> misuse | ||||
| this feature, and servers <bcp14>SHOULD</bcp14> respond to such misuse as described | ||||
| above, implementors need to be aware of the following considerations | ||||
| as they make necessary trade-offs between interoperability with | ||||
| existing implementations and proper support for facilities to | ||||
| allow lock recovery in the event of file system migration. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| When servers have no support for becoming the destination server | ||||
| of a file system subject to migration, there is no possibility of | ||||
| a per-fs RECLAIM_COMPLETE being done legitimately, and occurrences of it | ||||
| <bcp14>SHOULD</bcp14> be ignored. However, the negative consequences of accepting | ||||
| such mistaken use are quite limited as long as the client does | ||||
| not issue it | ||||
| before all necessary reclaims are done. | ||||
| </li> | ||||
| <li> | ||||
| When a server might become the destination for a file system being | ||||
| migrated, inappropriate use of per-fs RECLAIM_COMPLETE is more | ||||
| concerning. In the case in which the file system designated is not | ||||
| within a per-fs grace period, the per-fs RECLAIM_COMPLETE <bcp14>SHOULD</bcp14> | ||||
| be ignored, with the | ||||
| negative consequences of accepting it being limited, as in the | ||||
| case in which migration is not supported. However, if the server | ||||
| encounters a file system undergoing migration, the operation | ||||
| cannot be accepted | ||||
| as if it were a global RECLAIM_COMPLETE without invalidating its | ||||
| intended use. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_ILLEGAL" numbered="true" toc="default"> | ||||
| <name>Operation 10044: ILLEGAL - Illegal Operation</name> | ||||
| <section toc="exclude" anchor="OP_ILLEGAL_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| void;]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_ILLEGAL_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct ILLEGAL4res { | ||||
| nfsstat4 status; | ||||
| };]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_ILLEGAL_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation is a placeholder for encoding a result to handle the | ||||
| case of the client sending an operation code within COMPOUND that is | ||||
| not supported. See the COMPOUND procedure description for more | ||||
| details. | ||||
| </t> | ||||
| <t> | ||||
| The status field of ILLEGAL4res <bcp14>MUST</bcp14> be set to NFS4ERR_OP_ILLEGAL. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_ILLEGAL_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| A client will probably not send an operation with code OP_ILLEGAL but | ||||
| if it does, the response will be ILLEGAL4res just as it would be with | ||||
| any other invalid operation code. Note that if the server gets an | ||||
| illegal operation code that is not OP_ILLEGAL, and if the server | ||||
| checks for legal operation codes during the XDR decode phase, then the | ||||
| ILLEGAL4res would not be returned. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| </section> | ||||
| <!-- $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="nfsv41callbackprocedures" numbered="true" toc="default"> | ||||
| <name>NFSv4.1 Callback Procedures</name> | ||||
| <t> | ||||
| The procedures used for callbacks are defined in the following | ||||
| sections. In the interest of clarity, the terms "client" and "server" | ||||
| refer to NFS clients and servers, despite the fact that for an | ||||
| individual callback RPC, the sense of these terms would be precisely | ||||
| the opposite. | ||||
| </t> | ||||
| <t> | ||||
| Both procedures, CB_NULL and CB_COMPOUND, <bcp14>MUST</bcp14> be implemented. | ||||
| </t> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="PROC_CB_NULL" numbered="true" toc="default"> | ||||
| <name>Procedure 0: CB_NULL - No Operation</name> | ||||
| <section toc="exclude" anchor="PROC_CB_NULL_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| void;]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="PROC_CB_NULL_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| void;]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="PROC_CB_NULL_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| CB_NULL is the standard ONC RPC NULL procedure, with the standard void argument and void response. Even though | ||||
| there is no direct functionality associated with this procedure, the | ||||
| server will use CB_NULL to confirm the existence of a path for RPCs | ||||
| from the server to client. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="PROC_CB_NULL_ERRORS" numbered="true"> | ||||
| <name>ERRORS</name> | ||||
| <t> | ||||
| None. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="PROC_CB_COMPOUND" numbered="true" toc="default"> | ||||
| <name>Procedure 1: CB_COMPOUND - Compound Operations</name> | ||||
| <section toc="exclude" anchor="PROC_CB_COMPOUND_ARGUMENTS" numbered="true"> | ||||
| <name>ARGUMENTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| enum nfs_cb_opnum4 { | ||||
| OP_CB_GETATTR = 3, | ||||
| OP_CB_RECALL = 4, | ||||
| /* Callback operations new to NFSv4.1 */ | ||||
| OP_CB_LAYOUTRECALL = 5, | ||||
| OP_CB_NOTIFY = 6, | ||||
| OP_CB_PUSH_DELEG = 7, | ||||
| OP_CB_RECALL_ANY = 8, | ||||
| OP_CB_RECALLABLE_OBJ_AVAIL = 9, | ||||
| OP_CB_RECALL_SLOT = 10, | ||||
| OP_CB_SEQUENCE = 11, | ||||
| OP_CB_WANTS_CANCELLED = 12, | ||||
| OP_CB_NOTIFY_LOCK = 13, | ||||
| OP_CB_NOTIFY_DEVICEID = 14, | ||||
| OP_CB_ILLEGAL = 10044 | ||||
| }; | ||||
| union nfs_cb_argop4 switch (unsigned argop) { | ||||
| case OP_CB_GETATTR: | ||||
| CB_GETATTR4args opcbgetattr; | ||||
| case OP_CB_RECALL: | ||||
| CB_RECALL4args opcbrecall; | ||||
| case OP_CB_LAYOUTRECALL: | ||||
| CB_LAYOUTRECALL4args opcblayoutrecall; | ||||
| case OP_CB_NOTIFY: | ||||
| CB_NOTIFY4args opcbnotify; | ||||
| case OP_CB_PUSH_DELEG: | ||||
| CB_PUSH_DELEG4args opcbpush_deleg; | ||||
| case OP_CB_RECALL_ANY: | ||||
| CB_RECALL_ANY4args opcbrecall_any; | ||||
| case OP_CB_RECALLABLE_OBJ_AVAIL: | ||||
| CB_RECALLABLE_OBJ_AVAIL4args opcbrecallable_obj_avail; | ||||
| case OP_CB_RECALL_SLOT: | ||||
| CB_RECALL_SLOT4args opcbrecall_slot; | ||||
| case OP_CB_SEQUENCE: | ||||
| CB_SEQUENCE4args opcbsequence; | ||||
| case OP_CB_WANTS_CANCELLED: | ||||
| CB_WANTS_CANCELLED4args opcbwants_cancelled; | ||||
| case OP_CB_NOTIFY_LOCK: | ||||
| CB_NOTIFY_LOCK4args opcbnotify_lock; | ||||
| case OP_CB_NOTIFY_DEVICEID: | ||||
| CB_NOTIFY_DEVICEID4args opcbnotify_deviceid; | ||||
| case OP_CB_ILLEGAL: void; | ||||
| }; | ||||
| struct CB_COMPOUND4args { | ||||
| utf8str_cs tag; | ||||
| uint32_t minorversion; | ||||
| uint32_t callback_ident; | ||||
| nfs_cb_argop4 argarray<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="PROC_CB_COMPOUND_RESULTS" numbered="true"> | ||||
| <name>RESULTS</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| union nfs_cb_resop4 switch (unsigned resop) { | ||||
| case OP_CB_GETATTR: CB_GETATTR4res opcbgetattr; | ||||
| case OP_CB_RECALL: CB_RECALL4res opcbrecall; | ||||
| /* new NFSv4.1 operations */ | ||||
| case OP_CB_LAYOUTRECALL: | ||||
| CB_LAYOUTRECALL4res | ||||
| opcblayoutrecall; | ||||
| case OP_CB_NOTIFY: CB_NOTIFY4res opcbnotify; | ||||
| case OP_CB_PUSH_DELEG: CB_PUSH_DELEG4res | ||||
| opcbpush_deleg; | ||||
| case OP_CB_RECALL_ANY: CB_RECALL_ANY4res | ||||
| opcbrecall_any; | ||||
| case OP_CB_RECALLABLE_OBJ_AVAIL: | ||||
| CB_RECALLABLE_OBJ_AVAIL4res | ||||
| opcbrecallable_obj_avail; | ||||
| case OP_CB_RECALL_SLOT: | ||||
| CB_RECALL_SLOT4res | ||||
| opcbrecall_slot; | ||||
| case OP_CB_SEQUENCE: CB_SEQUENCE4res opcbsequence; | ||||
| case OP_CB_WANTS_CANCELLED: | ||||
| CB_WANTS_CANCELLED4res | ||||
| opcbwants_cancelled; | ||||
| case OP_CB_NOTIFY_LOCK: | ||||
| CB_NOTIFY_LOCK4res | ||||
| opcbnotify_lock; | ||||
| case OP_CB_NOTIFY_DEVICEID: | ||||
| CB_NOTIFY_DEVICEID4res | ||||
| opcbnotify_deviceid; | ||||
| /* Not new operation */ | ||||
| case OP_CB_ILLEGAL: CB_ILLEGAL4res opcbillegal; | ||||
| }; | ||||
| struct CB_COMPOUND4res { | ||||
| nfsstat4 status; | ||||
| utf8str_cs tag; | ||||
| nfs_cb_resop4 resarray<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_COMPOUND_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The CB_COMPOUND procedure is used to combine one or more of the | ||||
| callback procedures into a single RPC request. The main callback RPC | ||||
| program has two main procedures: CB_NULL and CB_COMPOUND. All other | ||||
| operations use the CB_COMPOUND procedure as a wrapper. | ||||
| </t> | ||||
| <t> | ||||
| During the processing of the CB_COMPOUND procedure, the client may find | ||||
| that it does not have the available resources to execute any or all of | ||||
| the operations within the CB_COMPOUND sequence. | ||||
| Refer to <xref target="COMPOUND_Sizing_Issues" format="default"/> for details. | ||||
| </t> | ||||
| <t> | ||||
| The minorversion field of the arguments <bcp14>MUST</bcp14> be the same as the | ||||
| minorversion of the COMPOUND procedure used to create the client ID | ||||
| and session. For NFSv4.1, minorversion <bcp14>MUST</bcp14> be set to 1. | ||||
| </t> | ||||
| <t> | ||||
| Contained within the CB_COMPOUND results is a "status" field. This | ||||
| status <bcp14>MUST</bcp14> be equal to the status of the last operation that was | ||||
| executed within the CB_COMPOUND procedure. Therefore, if an operation | ||||
| incurred an error, then the "status" value will be the same error value | ||||
| as is being returned for the operation that failed. | ||||
| </t> | ||||
| <t> | ||||
| The "tag" field is handled the same way as that of the COMPOUND | ||||
| procedure (see <xref target="OP_COMPOUND_DESCRIPTION" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| Illegal operation codes are handled in the same way as they are | ||||
| handled for the COMPOUND procedure. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="PROC_CB_COMPOUND_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The CB_COMPOUND procedure is used to combine individual operations | ||||
| into a single RPC request. The client interprets each of the | ||||
| operations in turn. If an operation is executed by the client and | ||||
| the status of that operation is NFS4_OK, then the next operation in | ||||
| the CB_COMPOUND procedure is executed. The client continues this | ||||
| process until there are no more operations to be executed or one of | ||||
| the operations has a status value other than NFS4_OK. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_COMPOUND_ERRORS" numbered="true"> | ||||
| <name>ERRORS</name> | ||||
| <t> | ||||
| CB_COMPOUND will of course return every error that each operation on | ||||
| the backchannel can return (see <xref target="cb_op_error_returns" format="default"/>). | ||||
| However, if CB_COMPOUND returns zero operations, obviously the error | ||||
| returned by COMPOUND has nothing to do with an error returned by | ||||
| an operation. The list of errors CB_COMPOUND will return if it processes | ||||
| zero operations includes: | ||||
| </t> | ||||
| <table anchor="CB_compounderrs" align="center"> | ||||
| <name>CB_COMPOUND Error Returns</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Error</th> | ||||
| <th align="left">Notes</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADCHAR</td> | ||||
| <td align="left">The tag argument has a character the replier | ||||
| does not support. </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_BADXDR</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_DELAY</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_INVAL</td> | ||||
| <td align="left">The tag argument is not in UTF-8 encoding.</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_MINOR_VERS_MISMATCH</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_SERVERFAULT</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_TOO_MANY_OPS</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REP_TOO_BIG</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REP_TOO_BIG_TO_CACHE</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NFS4ERR_REQ_TOO_BIG</td> | ||||
| <td align="left"> </td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| </section> | ||||
| <!-- $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="nfsv41cboperations" numbered="true" toc="default"> | ||||
| <name>NFSv4.1 Callback Operations</name> | ||||
| <!-- $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CB_GETATTR" numbered="true" toc="default"> | ||||
| <name>Operation 3: CB_GETATTR - Get Attributes</name> | ||||
| <section toc="exclude" anchor="OP_CB_GETATTR_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_GETATTR4args { | ||||
| nfs_fh4 fh; | ||||
| bitmap4 attr_request; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_GETATTR_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_GETATTR4resok { | ||||
| fattr4 obj_attributes; | ||||
| }; | ||||
| union CB_GETATTR4res switch (nfsstat4 status) { | ||||
| case NFS4_OK: | ||||
| CB_GETATTR4resok resok4; | ||||
| default: | ||||
| void; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_GETATTR_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The CB_GETATTR operation is used by the server to obtain the | ||||
| current modified state of a file that has been OPEN_DELEGATE_WRITE delegated. | ||||
| The size and change attributes are the only ones guaranteed to be | ||||
| serviced by the client. See <xref target="handling_cb_getattr" format="default"/> for a full description | ||||
| of how the client and server are to interact with | ||||
| the use of CB_GETATTR. | ||||
| </t> | ||||
| <t> | ||||
| If the filehandle specified is not one for which the client holds an | ||||
| OPEN_DELEGATE_WRITE delegation, an NFS4ERR_BADHANDLE error is returned. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_GETATTR_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The client returns attrmask bits and the associated attribute | ||||
| values only for the change attribute, and attributes that it may | ||||
| change (time_modify, and size). | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CB_RECALL" numbered="true" toc="default"> | ||||
| <name>Operation 4: CB_RECALL - Recall a Delegation</name> | ||||
| <section toc="exclude" anchor="OP_CB_RECALL_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_RECALL4args { | ||||
| stateid4 stateid; | ||||
| bool truncate; | ||||
| nfs_fh4 fh; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_RECALL_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_RECALL4res { | ||||
| nfsstat4 status; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_RECALL_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The CB_RECALL operation is used to begin the process of recalling | ||||
| a delegation and returning it to the server. | ||||
| </t> | ||||
| <t> | ||||
| The truncate flag is used to optimize recall for a file object that | ||||
| is a regular file and is | ||||
| about to be truncated to zero. When it is TRUE, the client is freed | ||||
| of the obligation to propagate modified data for the file to the | ||||
| server, since this data is irrelevant. | ||||
| </t> | ||||
| <t> | ||||
| If the handle specified is not one for which the client holds a | ||||
| delegation, an NFS4ERR_BADHANDLE error is returned. | ||||
| </t> | ||||
| <t> | ||||
| If the stateid specified is not one corresponding to an OPEN | ||||
| delegation for the file specified by the filehandle, an | ||||
| NFS4ERR_BAD_STATEID is returned. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_RECALL_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The client <bcp14>SHOULD</bcp14> reply to the callback immediately. | ||||
| Replying does not complete the recall except when | ||||
| the value of the reply's status field is neither | ||||
| NFS4ERR_DELAY nor NFS4_OK. The recall is not complete | ||||
| until the delegation is returned using a DELEGRETURN | ||||
| operation. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CB_LAYOUTRECALL" numbered="true" toc="default"> | ||||
| <name>Operation 5: CB_LAYOUTRECALL - Recall Layout from Client</name> | ||||
| <section toc="exclude" anchor="OP_CB_LAYOUTRECALL_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* | ||||
| * NFSv4.1 callback arguments and results | ||||
| */ | ||||
| enum layoutrecall_type4 { | ||||
| LAYOUTRECALL4_FILE = LAYOUT4_RET_REC_FILE, | ||||
| LAYOUTRECALL4_FSID = LAYOUT4_RET_REC_FSID, | ||||
| LAYOUTRECALL4_ALL = LAYOUT4_RET_REC_ALL | ||||
| }; | ||||
| struct layoutrecall_file4 { | ||||
| nfs_fh4 lor_fh; | ||||
| offset4 lor_offset; | ||||
| length4 lor_length; | ||||
| stateid4 lor_stateid; | ||||
| }; | ||||
| union layoutrecall4 switch(layoutrecall_type4 lor_recalltype) { | ||||
| case LAYOUTRECALL4_FILE: | ||||
| layoutrecall_file4 lor_layout; | ||||
| case LAYOUTRECALL4_FSID: | ||||
| fsid4 lor_fsid; | ||||
| case LAYOUTRECALL4_ALL: | ||||
| void; | ||||
| }; | ||||
| struct CB_LAYOUTRECALL4args { | ||||
| layouttype4 clora_type; | ||||
| layoutiomode4 clora_iomode; | ||||
| bool clora_changed; | ||||
| layoutrecall4 clora_recall; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_LAYOUTRECALL_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_LAYOUTRECALL4res { | ||||
| nfsstat4 clorr_status; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_LAYOUTRECALL_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The CB_LAYOUTRECALL operation is used by the server to recall | ||||
| layouts from the client; as a result, the client will begin the | ||||
| process of returning layouts via LAYOUTRETURN. The | ||||
| CB_LAYOUTRECALL operation specifies one of three forms of recall | ||||
| processing with the value of layoutrecall_type4. The recall is | ||||
| for one of the following: a specific layout of a specific file | ||||
| (LAYOUTRECALL4_FILE), an entire file system ID | ||||
| (LAYOUTRECALL4_FSID), or all file systems (LAYOUTRECALL4_ALL). | ||||
| </t> | ||||
| <t> | ||||
| The behavior of the operation varies based on the value of the | ||||
| layoutrecall_type4. The value and behaviors are: | ||||
| </t> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>LAYOUTRECALL4_FILE</dt> | ||||
| <dd> | ||||
| For a layout to match the recall request, the values of the following fields | ||||
| must match those of the layout: clora_type, clora_iomode, | ||||
| lor_fh, and the byte-range specified by lor_offset and | ||||
| lor_length. The clora_iomode field may have a special value | ||||
| of LAYOUTIOMODE4_ANY. The special value LAYOUTIOMODE4_ANY will match any | ||||
| iomode originally returned in a layout; therefore, it acts as a | ||||
| wild card. The other special value used is for | ||||
| lor_length. If lor_length has a value of NFS4_UINT64_MAX, the | ||||
| lor_length field means the maximum possible file size. If a | ||||
| matching layout is found, it <bcp14>MUST</bcp14> be returned using the | ||||
| LAYOUTRETURN operation (see <xref target="OP_LAYOUTRETURN" format="default"/>). | ||||
| An example of the field's special value use is if clora_iomode | ||||
| is LAYOUTIOMODE4_ANY, lor_offset is zero, and lor_length is | ||||
| NFS4_UINT64_MAX, then the entire layout is to be returned. | ||||
| </dd> | ||||
| <dt/> | ||||
| <dd> | ||||
| The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the | ||||
| client does not hold layouts for the file or if the client | ||||
| does not have any overlapping layouts for the specification in | ||||
| the layout recall. | ||||
| </dd> | ||||
| <dt>LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL</dt> | ||||
| <dd> | ||||
| If LAYOUTRECALL4_FSID is specified, the fsid specifies the | ||||
| file system for which any outstanding layouts <bcp14>MUST</bcp14> be | ||||
| returned. If LAYOUTRECALL4_ALL is specified, all outstanding | ||||
| layouts <bcp14>MUST</bcp14> be returned. In addition, LAYOUTRECALL4_FSID and | ||||
| LAYOUTRECALL4_ALL specify that all the storage device ID to | ||||
| storage device address mappings in the affected file system(s) | ||||
| are also recalled. The respective LAYOUTRETURN with either | ||||
| LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL acknowledges to the | ||||
| server that the client invalidated the said device mappings. | ||||
| See <xref target="bulk_layouts" format="default"/> for considerations with | ||||
| "bulk" recall of layouts. | ||||
| </dd> | ||||
| <dt/> | ||||
| <dd> | ||||
| The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the | ||||
| client does not hold layouts and does not have valid deviceid | ||||
| mappings. | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| In processing the layout recall request, the client also varies | ||||
| its behavior based on the value of the clora_changed field. This | ||||
| field is used by the server to provide additional context for | ||||
| the reason why the layout is being recalled. A FALSE value for | ||||
| clora_changed indicates that no change in the layout is expected | ||||
| and the client may write modified data to the storage devices | ||||
| involved; this must be done prior to returning the layout via | ||||
| LAYOUTRETURN. A TRUE value for clora_changed indicates that the | ||||
| server is changing the layout. Examples of layout changes and | ||||
| reasons for a TRUE indication are the following: the metadata server is restriping | ||||
| the file or a permanent error has occurred on a storage device | ||||
| and the metadata server would like to provide a new layout for | ||||
| the file. Therefore, a clora_changed value of TRUE indicates | ||||
| some level of change for the layout and the client <bcp14>SHOULD NOT</bcp14> | ||||
| write and commit modified data to the storage devices. In this | ||||
| case, the client writes and commits data through the metadata | ||||
| server. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="layout_stateid" format="default"/> for a description of how the | ||||
| lor_stateid field in the arguments is to be constructed. Note | ||||
| that the "seqid" field of lor_stateid <bcp14>MUST NOT</bcp14> be zero. See Sections | ||||
| <xref target="stateid" format="counter"/>, <xref target="layout_stateid" format="counter"/>, and | ||||
| <xref target="pnfs_operation_sequencing" format="counter"/> for a further | ||||
| discussion and requirements. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_LAYOUTRECALL_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The client's processing for CB_LAYOUTRECALL is similar to | ||||
| CB_RECALL (recall of file delegations) in that | ||||
| the client responds to | ||||
| the request before actually returning layouts via the | ||||
| LAYOUTRETURN operation. While the client responds to the | ||||
| CB_LAYOUTRECALL immediately, the operation is not considered | ||||
| complete (i.e., considered pending) until all affected layouts are returned to the server | ||||
| via the LAYOUTRETURN operation. | ||||
| </t> | ||||
| <t> | ||||
| Before returning the layout to the server via LAYOUTRETURN, the | ||||
| client should wait for the response from in-process or in-flight | ||||
| READ, WRITE, or COMMIT operations that use the recalled layout. | ||||
| </t> | ||||
| <t> | ||||
| If the client is holding modified data that is affected by a | ||||
| recalled layout, the client has various options for writing the | ||||
| data to the server. As always, the client may write the data | ||||
| through the metadata server. In fact, the client may not have a | ||||
| choice other than writing to the metadata server when the | ||||
| clora_changed argument is TRUE and a new layout is unavailable | ||||
| from the server. However, the client may be able to write the | ||||
| modified data to the storage device if the clora_changed | ||||
| argument is FALSE; this needs to be done before returning the | ||||
| layout via LAYOUTRETURN. If the client were to obtain a new | ||||
| layout covering the modified data's byte-range, then writing to the | ||||
| storage devices is an available alternative. Note that before | ||||
| obtaining a new layout, the client must first return the | ||||
| original layout. | ||||
| </t> | ||||
| <t> | ||||
| In the case of modified data being written while the layout is | ||||
| held, the client must use LAYOUTCOMMIT operations at the | ||||
| appropriate time; as required LAYOUTCOMMIT must be done before | ||||
| the LAYOUTRETURN. If a large amount of modified data is | ||||
| outstanding, the client may send LAYOUTRETURNs for portions of | ||||
| the recalled layout; this allows the server to monitor the | ||||
| client's progress and adherence to the original recall request. | ||||
| However, the last LAYOUTRETURN in a sequence of returns <bcp14>MUST</bcp14> | ||||
| specify the full range being recalled (see <xref target="recall_robustness" format="default"/> for details). | ||||
| </t> | ||||
| <t> | ||||
| If a server needs to delete a device ID and there are layouts | ||||
| referring to the device ID, CB_LAYOUTRECALL <bcp14>MUST</bcp14> be invoked to | ||||
| cause the client to return all layouts referring to the device ID | ||||
| before the server can delete the device ID. If the client | ||||
| does not return the affected layouts, the server <bcp14>MAY</bcp14> revoke | ||||
| the layouts. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CB_NOTIFY" numbered="true" toc="default"> | ||||
| <name>Operation 6: CB_NOTIFY - Notify Client of Directory Changes</name> | ||||
| <section toc="exclude" anchor="OP_CB_NOTIFY_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* | ||||
| * Directory notification types. | ||||
| */ | ||||
| enum notify_type4 { | ||||
| NOTIFY4_CHANGE_CHILD_ATTRS = 0, | ||||
| NOTIFY4_CHANGE_DIR_ATTRS = 1, | ||||
| NOTIFY4_REMOVE_ENTRY = 2, | ||||
| NOTIFY4_ADD_ENTRY = 3, | ||||
| NOTIFY4_RENAME_ENTRY = 4, | ||||
| NOTIFY4_CHANGE_COOKIE_VERIFIER = 5 | ||||
| }; | ||||
| /* Changed entry information. */ | ||||
| struct notify_entry4 { | ||||
| component4 ne_file; | ||||
| fattr4 ne_attrs; | ||||
| }; | ||||
| /* Previous entry information */ | ||||
| struct prev_entry4 { | ||||
| notify_entry4 pe_prev_entry; | ||||
| /* what READDIR returned for this entry */ | ||||
| nfs_cookie4 pe_prev_entry_cookie; | ||||
| }; | ||||
| struct notify_remove4 { | ||||
| notify_entry4 nrm_old_entry; | ||||
| nfs_cookie4 nrm_old_entry_cookie; | ||||
| }; | ||||
| struct notify_add4 { | ||||
| /* | ||||
| * Information on object | ||||
| * possibly renamed over. | ||||
| */ | ||||
| notify_remove4 nad_old_entry<1>; | ||||
| notify_entry4 nad_new_entry; | ||||
| /* what READDIR would have returned for this entry */ | ||||
| nfs_cookie4 nad_new_entry_cookie<1>; | ||||
| prev_entry4 nad_prev_entry<1>; | ||||
| bool nad_last_entry; | ||||
| }; | ||||
| struct notify_attr4 { | ||||
| notify_entry4 na_changed_entry; | ||||
| }; | ||||
| struct notify_rename4 { | ||||
| notify_remove4 nrn_old_entry; | ||||
| notify_add4 nrn_new_entry; | ||||
| }; | ||||
| struct notify_verifier4 { | ||||
| verifier4 nv_old_cookieverf; | ||||
| verifier4 nv_new_cookieverf; | ||||
| }; | ||||
| /* | ||||
| * Objects of type notify_<>4 and | ||||
| * notify_device_<>4 are encoded in this. | ||||
| */ | ||||
| typedef opaque notifylist4<>; | ||||
| struct notify4 { | ||||
| /* composed from notify_type4 or notify_deviceid_type4 */ | ||||
| bitmap4 notify_mask; | ||||
| notifylist4 notify_vals; | ||||
| }; | ||||
| struct CB_NOTIFY4args { | ||||
| stateid4 cna_stateid; | ||||
| nfs_fh4 cna_fh; | ||||
| notify4 cna_changes<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_NOTIFY_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_NOTIFY4res { | ||||
| nfsstat4 cnr_status; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_NOTIFY_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The CB_NOTIFY operation is used by the server to | ||||
| send notifications to clients about changes to | ||||
| delegated directories. | ||||
| The registration of notifications for the directories | ||||
| occurs when the delegation is established using | ||||
| GET_DIR_DELEGATION. | ||||
| These notifications are sent over the backchannel. The | ||||
| notification is sent once the original request has been | ||||
| processed on the server. The server will send an array of | ||||
| notifications for changes that might have occurred in the | ||||
| directory. The notifications are sent as list of pairs of | ||||
| bitmaps and values. | ||||
| See <xref target="fattr4" format="default"/> | ||||
| for a description of how NFSv4.1 bitmaps work. | ||||
| </t> | ||||
| <t> | ||||
| If the server has more notifications than can fit in | ||||
| the CB_COMPOUND request, it <bcp14>SHOULD</bcp14> send a sequence of | ||||
| serial CB_COMPOUND requests so that the client's view | ||||
| of the directory does not become confused. For example, if the | ||||
| server indicates that a file named "foo" is added and that the | ||||
| file "foo" is removed, the order in which the client receives | ||||
| these notifications needs to be the same as the | ||||
| order in which the corresponding operations occurred on the server. | ||||
| </t> | ||||
| <t> | ||||
| If the client holding the delegation makes any | ||||
| changes in the directory that cause files or sub-directories to | ||||
| be added or removed, the server will | ||||
| notify that client of the resulting change(s). If the | ||||
| client holding the delegation is making attribute | ||||
| or cookie verifier changes only, the server does | ||||
| not need to send notifications to that client. | ||||
| The server will send the following information for | ||||
| each operation: | ||||
| </t> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>NOTIFY4_ADD_ENTRY</dt> | ||||
| <dd> | ||||
| The server will send | ||||
| information about the new directory entry being created along with the | ||||
| cookie for that entry. The entry information (data type | ||||
| notify_add4) includes the component name of the entry and | ||||
| attributes. The server will send this type of entry when a | ||||
| file is actually being created, when an entry is being added | ||||
| to a directory as a result of a rename across directories | ||||
| (see below), and when a hard link is being created to an | ||||
| existing file. If this entry is added to the end of the | ||||
| directory, the server will set the nad_last_entry flag to | ||||
| TRUE. If the file is added such that there is at least one | ||||
| entry before it, the server will also return the previous | ||||
| entry information (nad_prev_entry, a variable-length array | ||||
| of up to one element. If the array is of zero length, there | ||||
| is no previous entry), along with its cookie. This is to | ||||
| help clients find the right location in their file name caches and | ||||
| directory caches where this entry should be cached. If the | ||||
| new entry's cookie is available, it will be in | ||||
| the nad_new_entry_cookie (another variable-length array of up to | ||||
| one element) field. If the addition of the entry causes another | ||||
| entry to be deleted (which can only happen in the rename | ||||
| case) atomically with the addition, then information on | ||||
| this entry is reported in nad_old_entry. | ||||
| </dd> | ||||
| <dt>NOTIFY4_REMOVE_ENTRY</dt> | ||||
| <dd> | ||||
| The server will send information about the directory entry | ||||
| being deleted. The server will also send the cookie value | ||||
| for the deleted entry so that clients can get to the cached | ||||
| information for this entry. | ||||
| </dd> | ||||
| <dt>NOTIFY4_RENAME_ENTRY</dt> | ||||
| <dd> | ||||
| The server will send information about both | ||||
| the old entry and the new entry. This includes the name and | ||||
| attributes for each entry. In addition, if the rename | ||||
| causes the deletion of an entry (i.e., the case of a file | ||||
| renamed over), then this is reported in | ||||
| nrn_new_new_entry.nad_old_entry. | ||||
| This notification is only sent if | ||||
| both entries are in the same directory. If the rename is | ||||
| across directories, the server will send a remove | ||||
| notification to one directory and an add notification to the | ||||
| other directory, assuming both have a directory delegation. | ||||
| </dd> | ||||
| <dt>NOTIFY4_CHANGE_CHILD_ATTRS/NOTIFY4_CHANGE_DIR_ATTRS</dt> | ||||
| <dd> | ||||
| The client will use the attribute | ||||
| mask to inform the server of attributes for which it wants to | ||||
| receive notifications. This change notification can be | ||||
| requested for changes to the attributes of the directory | ||||
| as well as changes to any file's attributes in the directory by | ||||
| using two separate attribute masks. The client cannot ask | ||||
| for change attribute notification for a specific file. One attribute | ||||
| mask covers all the files in the directory. Upon any | ||||
| attribute change, the server will send back the values of | ||||
| changed attributes. Notifications might not make sense for | ||||
| some file system-wide attributes, and it is up to the server to | ||||
| decide which subset it wants to support. The client can | ||||
| negotiate the frequency of attribute notifications by letting | ||||
| the server know how often it wants to be notified of an | ||||
| attribute change. The server will return supported | ||||
| notification frequencies or an indication that no | ||||
| notification is permitted for directory or child attributes | ||||
| by setting the dir_notif_delay and | ||||
| dir_entry_notif_delay attributes, respectively. | ||||
| </dd> | ||||
| <dt>NOTIFY4_CHANGE_COOKIE_VERIFIER</dt> | ||||
| <dd> | ||||
| If the cookie verifier changes while | ||||
| a client is holding a delegation, the server will notify the | ||||
| client so that it can invalidate its cookies and re-send a | ||||
| READDIR to get the new set of cookies. | ||||
| </dd> | ||||
| </dl> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CB_PUSH_DELEG" numbered="true" toc="default"> | ||||
| <name>Operation 7: CB_PUSH_DELEG - Offer Previously Requested Delegation to Client</name> | ||||
| <section toc="exclude" anchor="OP_CB_PUSH_DELEG_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_PUSH_DELEG4args { | ||||
| nfs_fh4 cpda_fh; | ||||
| open_delegation4 cpda_delegation; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_PUSH_DELEG_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_PUSH_DELEG4res { | ||||
| nfsstat4 cpdr_status; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_PUSH_DELEG_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| CB_PUSH_DELEG is used by the server both to signal to the | ||||
| client that the delegation it wants (previously indicated | ||||
| via a want established from an | ||||
| OPEN or WANT_DELEGATION operation) is available and to | ||||
| simultaneously offer the delegation to the client. The client | ||||
| has the choice of accepting the delegation by returning | ||||
| NFS4_OK to the server, delaying the decision to accept the | ||||
| offered delegation by returning NFS4ERR_DELAY, | ||||
| or permanently rejecting the offer of the | ||||
| delegation by returning NFS4ERR_REJECT_DELEG. | ||||
| When a delegation is rejected in this fashion, the want | ||||
| previously established is permanently deleted and the delegation | ||||
| is subject to acquisition by another client. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_PUSH_DELEG_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| If the client does return NFS4ERR_DELAY | ||||
| and there is a conflicting delegation request, the server <bcp14>MAY</bcp14> | ||||
| process it at the expense of the client that returned | ||||
| NFS4ERR_DELAY. The client's want will not be cancelled, but | ||||
| <bcp14>MAY</bcp14> be processed behind other delegation requests or registered | ||||
| wants. | ||||
| </t> | ||||
| <t> | ||||
| When a client returns a status other than NFS4_OK, NFS4ERR_DELAY, | ||||
| or NFS4ERR_REJECT_DELAY, the want remains pending, although | ||||
| servers may decide to cancel the want by sending a CB_WANTS_CANCELLED. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CB_RECALL_ANY" numbered="true" toc="default"> | ||||
| <name>Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects</name> | ||||
| <section toc="exclude" anchor="OP_CB_RECALL_ANY_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| const RCA4_TYPE_MASK_RDATA_DLG = 0; | ||||
| const RCA4_TYPE_MASK_WDATA_DLG = 1; | ||||
| const RCA4_TYPE_MASK_DIR_DLG = 2; | ||||
| const RCA4_TYPE_MASK_FILE_LAYOUT = 3; | ||||
| const RCA4_TYPE_MASK_BLK_LAYOUT = 4; | ||||
| const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8; | ||||
| const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9; | ||||
| const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12; | ||||
| const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15; | ||||
| struct CB_RECALL_ANY4args { | ||||
| uint32_t craa_objects_to_keep; | ||||
| bitmap4 craa_type_mask; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_RECALL_ANY_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_RECALL_ANY4res { | ||||
| nfsstat4 crar_status; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_RECALL_ANY_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The server may decide that it cannot hold all of the state for | ||||
| recallable objects, such as delegations and layouts, without | ||||
| running out of resources. In such a case, while not optimal, | ||||
| the server is free to recall individual objects to reduce the load. | ||||
| </t> | ||||
| <t> | ||||
| Because the general purpose of such recallable objects as | ||||
| delegations is to eliminate client interaction with the server, | ||||
| the server cannot interpret lack of recent use as indicating | ||||
| that the object is no longer useful. The absence of visible | ||||
| use is consistent with a delegation keeping potential operations | ||||
| from being sent to the server. In the case of layouts, while it | ||||
| is true that the usefulness of a layout | ||||
| is indicated by the use of the layout when storage devices receive | ||||
| I/O requests, because there is no mandate that a storage | ||||
| device indicate to the metadata server any past or | ||||
| present use of a layout, the metadata server is not likely to know | ||||
| which layouts are good candidates to recall in response to | ||||
| low resources. | ||||
| </t> | ||||
| <t> | ||||
| In order to implement an effective reclaim scheme for such | ||||
| objects, the server's knowledge of available resources must be | ||||
| used to determine when objects must be recalled with the | ||||
| clients selecting the actual objects to be returned. | ||||
| </t> | ||||
| <t> | ||||
| Server implementations may differ in their resource allocation | ||||
| requirements. For example, one server may share resources among | ||||
| all classes of recallable objects, whereas another may use | ||||
| separate resource pools for layouts and for delegations, or | ||||
| further separate resources by types of delegations. | ||||
| </t> | ||||
| <t> | ||||
| When a given resource pool is over-utilized, the server can | ||||
| send a CB_RECALL_ANY to clients holding recallable objects of | ||||
| the types involved, allowing it to keep a certain number of | ||||
| such objects and return any excess. A mask specifies which | ||||
| types of objects are to be limited. The client chooses, based | ||||
| on its own knowledge of current usefulness, which of the objects | ||||
| in that class should be returned. | ||||
| </t> | ||||
| <t> | ||||
| A number of bits are defined. For some of these, ranges | ||||
| are defined and it is up to the definition of the storage | ||||
| protocol to specify how these are to be used. There are ranges | ||||
| reserved for object-based storage | ||||
| protocols and for other experimental storage | ||||
| protocols. An RFC defining such a storage protocol needs to | ||||
| specify how particular bits within its range are to be used. | ||||
| For example, it may specify a mapping between attributes of | ||||
| the layout (read vs. write, size of area) and the bit to be | ||||
| used, or it may define a field in the layout where the associated | ||||
| bit position is made available by the server to the client. | ||||
| </t> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>RCA4_TYPE_MASK_RDATA_DLG</dt> | ||||
| <dd> | ||||
| The client is to return OPEN_DELEGATE_READ delegations on | ||||
| non-directory file objects. | ||||
| </dd> | ||||
| <dt>RCA4_TYPE_MASK_WDATA_DLG</dt> | ||||
| <dd> | ||||
| The client is to return OPEN_DELEGATE_WRITE delegations on | ||||
| regular file objects. | ||||
| </dd> | ||||
| <dt>RCA4_TYPE_MASK_DIR_DLG</dt> | ||||
| <dd> | ||||
| The client is to return directory delegations. | ||||
| </dd> | ||||
| <dt>RCA4_TYPE_MASK_FILE_LAYOUT</dt> | ||||
| <dd> | ||||
| The client is to return layouts of type LAYOUT4_NFSV4_1_FILES. | ||||
| </dd> | ||||
| <dt>RCA4_TYPE_MASK_BLK_LAYOUT</dt> | ||||
| <dd> | ||||
| See <xref target="RFC5663" format="default"/> for a description. | ||||
| </dd> | ||||
| <dt>RCA4_TYPE_MASK_OBJ_LAYOUT_MIN to RCA4_TYPE_MASK_OBJ_LAYOUT_MAX</dt> | ||||
| <dd> | ||||
| See <xref target="RFC5664" format="default"/> for a description. | ||||
| </dd> | ||||
| <dt>RCA4_TYPE_MASK_OTHER_LAYOUT_MIN to RCA4_TYPE_MASK_OTHER_LAYOUT_MAX</dt> | ||||
| <dd> | ||||
| This range is reserved for telling the client to recall | ||||
| layouts of experimental | ||||
| or site-specific layout types (see <xref target="layouttype4" format="default"/>). | ||||
| </dd> | ||||
| </dl> | ||||
| <t> | ||||
| When a bit is set in the type mask that corresponds | ||||
| to an undefined type of recallable object, | ||||
| NFS4ERR_INVAL <bcp14>MUST</bcp14> be returned. When a bit is set | ||||
| that corresponds to a defined type of object but | ||||
| the client does not support an object of the type, | ||||
| NFS4ERR_INVAL <bcp14>MUST NOT</bcp14> be returned. Future minor | ||||
| versions of NFSv4 may expand the set of valid type | ||||
| mask bits. | ||||
| </t> | ||||
| <t> | ||||
| CB_RECALL_ANY specifies a count of objects that the client may | ||||
| keep as opposed to a count that the client must return. This | ||||
| is to avoid a potential race between a CB_RECALL_ANY that had a | ||||
| count of objects to free with a set of client-originated | ||||
| operations to return layouts or delegations. As a result of the | ||||
| race, the client and server would have differing ideas as to how | ||||
| many objects to return. Hence, the client could mistakenly free | ||||
| too many. | ||||
| </t> | ||||
| <t> | ||||
| If resource demands prompt it, the server may send another | ||||
| CB_RECALL_ANY with a lower count, even if it has not yet received | ||||
| an acknowledgment from the client for a previous CB_RECALL_ANY | ||||
| with the same type mask. Although the possibility exists that | ||||
| these will be received by the client in an order different from | ||||
| the order in which they were sent, any such permutation of | ||||
| the callback stream is harmless. It is the job of the client | ||||
| to bring down the size of the recallable object set in line | ||||
| with each CB_RECALL_ANY received, and until that obligation is | ||||
| met, it cannot be cancelled or modified by any subsequent | ||||
| CB_RECALL_ANY for the same type mask. Thus, if the server | ||||
| sends two CB_RECALL_ANYs, the effect will be the same as | ||||
| if the lower count was sent, whatever the order of recall | ||||
| receipt. Note that this means that a server may not cancel | ||||
| the effect of a CB_RECALL_ANY by sending another recall with | ||||
| a higher count. When a CB_RECALL_ANY is received and the | ||||
| count is already within the limit set or is above a limit | ||||
| that the client is working to get down to, that callback has no | ||||
| effect. | ||||
| </t> | ||||
| <t> | ||||
| Servers are generally free to deny recallable objects | ||||
| when insufficient resources are available. Note that the | ||||
| effect of such a policy is implicitly to give precedence to | ||||
| existing objects relative to requested ones, with the result | ||||
| that resources might not be optimally used. To prevent this, | ||||
| servers are well advised to make the point at which they start | ||||
| sending CB_RECALL_ANY callbacks somewhat below that at which they | ||||
| cease to give out new delegations and layouts. This allows the | ||||
| client to purge its less-used objects whenever appropriate and | ||||
| so continue to have its subsequent requests given new resources | ||||
| freed up by object returns. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_RECALL_ANY_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The client can choose to return any type of object specified | ||||
| by the mask. If a server wishes to limit the use of objects of a | ||||
| specific type, it should only specify that type in the mask | ||||
| it sends. Should the client fail to return requested objects, it is | ||||
| up to the server to handle this situation, typically by sending | ||||
| specific recalls (i.e., sending CB_RECALL operations) | ||||
| to properly limit resource usage. The server | ||||
| should give the client enough time to return objects before | ||||
| proceeding to specific recalls. This time should not be less | ||||
| than the lease period. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CB_RECALLABLE_OBJ_AVAIL" numbered="true" toc="default"> | ||||
| <name>Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal Resources for Recallable Objects</name> | ||||
| <section toc="exclude" anchor="OP_CB_RECALLABLE_OBJ_AVAIL_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| typedef CB_RECALL_ANY4args CB_RECALLABLE_OBJ_AVAIL4args; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_RECALLABLE_OBJ_AVAIL_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_RECALLABLE_OBJ_AVAIL4res { | ||||
| nfsstat4 croa_status; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_RECALLABLE_OBJ_AVAIL_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the | ||||
| client that the server has resources to grant recallable | ||||
| objects that might previously have been denied by OPEN, | ||||
| WANT_DELEGATION, GET_DIR_DELEG, or LAYOUTGET. | ||||
| </t> | ||||
| <t> | ||||
| The argument craa_objects_to_keep means the total number of | ||||
| recallable objects of the types indicated in the argument | ||||
| type_mask that the server believes it can allow the client to | ||||
| have, including the number of such objects the client already | ||||
| has. A client that tries to acquire more recallable objects | ||||
| than the server informs it can have runs the risk of having | ||||
| objects recalled. | ||||
| </t> | ||||
| <t> | ||||
| The server is not obligated to reserve the | ||||
| difference between the number of the objects | ||||
| the client currently has and the value of | ||||
| craa_objects_to_keep, nor does delaying the reply | ||||
| to CB_RECALLABLE_OBJ_AVAIL prevent the server | ||||
| from using the resources of the recallable objects | ||||
| for another purpose. Indeed, if a client responds | ||||
| slowly to CB_RECALLABLE_OBJ_AVAIL, the server might | ||||
| interpret the client as having reduced capability | ||||
| to manage recallable objects, and so cancel | ||||
| or reduce any reservation it is maintaining on behalf | ||||
| of the client. | ||||
| Thus, if the client desires to acquire more | ||||
| recallable objects, it needs to reply quickly | ||||
| to CB_RECALLABLE_OBJ_AVAIL, and then send the | ||||
| appropriate operations to acquire recallable | ||||
| objects. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CB_RECALL_SLOT" numbered="true" toc="default"> | ||||
| <name>Operation 10: CB_RECALL_SLOT - Change Flow Control Limits</name> | ||||
| <section toc="exclude" anchor="OP_CB_RECALL_SLOT_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_RECALL_SLOT4args { | ||||
| slotid4 rsa_target_highest_slotid; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_RECALL_SLOT_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_RECALL_SLOT4res { | ||||
| nfsstat4 rsr_status; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_RECALL_SLOT_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The CB_RECALL_SLOT operation requests the client to | ||||
| return session slots, and if applicable, transport | ||||
| credits (e.g., RDMA credits for connections associated with | ||||
| the operations channel) of the session's fore channel. | ||||
| CB_RECALL_SLOT specifies | ||||
| rsa_target_highest_slotid, the value of the target highest slot ID the server wants | ||||
| for the session. The client <bcp14>MUST</bcp14> then progress toward reducing | ||||
| the session's highest slot ID to the target value. | ||||
| </t> | ||||
| <t> | ||||
| If the session has only non-RDMA connections associated with its | ||||
| operations channel, then the client need only wait | ||||
| for all outstanding requests with a slot ID > | ||||
| rsa_target_highest_slotid to complete, then send | ||||
| a single COMPOUND consisting of a single SEQUENCE operation, | ||||
| with the sa_highestslot field set to rsa_target_highest_slotid. | ||||
| If there are RDMA-based connections associated with | ||||
| operation channel, then the client needs to also | ||||
| send enough zero-length "RDMA Send" messages to take the total | ||||
| <!-- [auth] Please leave this use of "Send" capitalized in order to denote | ||||
| an artifact particular to RDMA-based communication. Thanks. --> | ||||
| RDMA credit count to rsa_target_highest_slotid + 1 or below. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_RECALL_SLOT_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| If the client fails to reduce highest slot it has on the fore channel | ||||
| to what the server requests, the server can force the issue | ||||
| by asserting flow control on the receive side of | ||||
| all connections bound to the fore channel, and then | ||||
| finish servicing all outstanding requests that are | ||||
| in slots greater than rsa_target_highest_slotid. Once that | ||||
| is done, the server can then open the flow control, and any time | ||||
| the client sends a new request on a slot greater than | ||||
| rsa_target_highest_slotid, the server can return NFS4ERR_BADSLOT. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CB_SEQUENCE" numbered="true" toc="default"> | ||||
| <name>Operation 11: CB_SEQUENCE - Supply Backchannel Sequencing and Control</name> | ||||
| <section toc="exclude" anchor="OP_CB_SEQUENCE_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct referring_call4 { | ||||
| sequenceid4 rc_sequenceid; | ||||
| slotid4 rc_slotid; | ||||
| }; | ||||
| struct referring_call_list4 { | ||||
| sessionid4 rcl_sessionid; | ||||
| referring_call4 rcl_referring_calls<>; | ||||
| }; | ||||
| struct CB_SEQUENCE4args { | ||||
| sessionid4 csa_sessionid; | ||||
| sequenceid4 csa_sequenceid; | ||||
| slotid4 csa_slotid; | ||||
| slotid4 csa_highest_slotid; | ||||
| bool csa_cachethis; | ||||
| referring_call_list4 csa_referring_call_lists<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_SEQUENCE_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_SEQUENCE4resok { | ||||
| sessionid4 csr_sessionid; | ||||
| sequenceid4 csr_sequenceid; | ||||
| slotid4 csr_slotid; | ||||
| slotid4 csr_highest_slotid; | ||||
| slotid4 csr_target_highest_slotid; | ||||
| }; | ||||
| union CB_SEQUENCE4res switch (nfsstat4 csr_status) { | ||||
| case NFS4_OK: | ||||
| CB_SEQUENCE4resok csr_resok4; | ||||
| default: | ||||
| void; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_SEQUENCE_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The CB_SEQUENCE operation is used to manage operational accounting | ||||
| for the backchannel of the session on which a request is | ||||
| sent. The contents include the session ID to which this | ||||
| request belongs, the slot ID and sequence ID used by the server to | ||||
| implement session request control and exactly once | ||||
| semantics, and exchanged slot ID maxima that are used to adjust the | ||||
| size of the reply cache. In each CB_COMPOUND request, CB_SEQUENCE | ||||
| <bcp14>MUST</bcp14> appear once and <bcp14>MUST</bcp14> be the first operation. The error | ||||
| NFS4ERR_SEQUENCE_POS <bcp14>MUST</bcp14> be returned when CB_SEQUENCE is found in | ||||
| any position in a CB_COMPOUND beyond the first. If any | ||||
| other operation is in the first position of CB_COMPOUND, | ||||
| NFS4ERR_OP_NOT_IN_SESSION <bcp14>MUST</bcp14> be returned. | ||||
| </t> | ||||
| <t> | ||||
| See <xref target="OP_SEQUENCE_DESCRIPTION" format="default"/> for a description of | ||||
| how slots are processed. | ||||
| </t> | ||||
| <t> | ||||
| If csa_cachethis is TRUE, then the server is requesting that | ||||
| the client cache the reply in the callback reply cache. The client <bcp14>MUST</bcp14> | ||||
| cache the reply (see <xref target="optional_reply_caching" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| The csa_referring_call_lists array is the list of COMPOUND | ||||
| requests, identified by session ID, slot ID, and sequence ID. These | ||||
| are requests that the client previously sent to the server. | ||||
| These previous requests created state that some operation(s) | ||||
| in the same CB_COMPOUND as the csa_referring_call_lists are | ||||
| identifying. | ||||
| A session ID is included because | ||||
| leased state is tied to a client ID, and a client ID can have | ||||
| multiple sessions. See | ||||
| <xref target="sessions_callback_races" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| The value of the csa_sequenceid argument relative to | ||||
| the cached sequence ID on the slot falls into one | ||||
| of three cases. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| If the difference between csa_sequenceid and | ||||
| the client's cached sequence ID at the slot ID | ||||
| is two (2) or more, | ||||
| or if csa_sequenceid is less | ||||
| than the cached sequence ID (accounting | ||||
| for wraparound of the unsigned sequence ID value), | ||||
| then the client <bcp14>MUST</bcp14> return NFS4ERR_SEQ_MISORDERED. | ||||
| </li> | ||||
| <li> | ||||
| If csa_sequenceid and the cached sequence ID are the | ||||
| same, this is a retry, and the client returns the | ||||
| CB_COMPOUND request's cached reply. | ||||
| </li> | ||||
| <li> | ||||
| If csa_sequenceid is one greater (accounting for | ||||
| wraparound) than the cached sequence ID, then | ||||
| this is a new request, and the slot's sequence | ||||
| ID is incremented. The operations subsequent to | ||||
| CB_SEQUENCE, if any, are processed. If there are no | ||||
| other operations, the only other effects are to | ||||
| cache the CB_SEQUENCE reply in the slot, maintain the | ||||
| session's activity, and when the server receives the | ||||
| CB_SEQUENCE reply, renew the lease of state | ||||
| related to the client ID. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| If the server reuses a slot ID and sequence ID for | ||||
| a completely different request, the client <bcp14>MAY</bcp14> | ||||
| treat the request as if it is a retry | ||||
| of what it has already executed. The client <bcp14>MAY</bcp14> however | ||||
| detect the server's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. | ||||
| </t> | ||||
| <t> | ||||
| If CB_SEQUENCE returns an error, then the state of the slot (sequence ID, | ||||
| cached reply) <bcp14>MUST NOT</bcp14> change. | ||||
| See <xref target="optional_reply_caching" format="default"/> for the conditions when the | ||||
| error NFS4ERR_RETRY_UNCACHED_REP might be returned. | ||||
| </t> | ||||
| <t> | ||||
| The client returns two "highest_slotid" values: | ||||
| csr_highest_slotid and csr_target_highest_slotid. The | ||||
| former is the highest slot ID the client will accept | ||||
| in a future CB_SEQUENCE operation, and <bcp14>SHOULD NOT</bcp14> be | ||||
| less than the value of csa_highest_slotid (but see | ||||
| <xref target="Slot_Identifiers_and_Server_Reply_Cache" format="default"/> for an exception). The latter is the highest slot | ||||
| ID the client would prefer the server use on a future | ||||
| CB_SEQUENCE operation. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CB_WANTS_CANCELLED" numbered="true" toc="default"> | ||||
| <name>Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation Wants</name> | ||||
| <section toc="exclude" anchor="OP_CB_WANTS_CANCELLED_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_WANTS_CANCELLED4args { | ||||
| bool cwca_contended_wants_cancelled; | ||||
| bool cwca_resourced_wants_cancelled; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_WANTS_CANCELLED_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_WANTS_CANCELLED4res { | ||||
| nfsstat4 cwcr_status; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_WANTS_CANCELLED_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The CB_WANTS_CANCELLED operation is used to notify the client that | ||||
| some or all of the wants it registered for recallable delegations and layouts | ||||
| have been cancelled. | ||||
| </t> | ||||
| <t> | ||||
| If cwca_contended_wants_cancelled is TRUE, this indicates that | ||||
| the server will not be pushing to the client any delegations | ||||
| that become available after contention passes. | ||||
| </t> | ||||
| <t> | ||||
| If cwca_resourced_wants_cancelled is TRUE, this indicates that | ||||
| the server will not notify the client when there are resources | ||||
| on the server to grant delegations or layouts. | ||||
| </t> | ||||
| <t> | ||||
| After receiving a CB_WANTS_CANCELLED operation, the | ||||
| client is free to attempt to acquire the delegations or | ||||
| layouts it was waiting for, and possibly re-register wants. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_WANTS_CANCELLED_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| When a client has an OPEN, WANT_DELEGATION, or GET_DIR_DELEGATION request | ||||
| outstanding, when a CB_WANTS_CANCELLED is sent, the server may need to | ||||
| make clear to the client whether a promise to signal delegation availability | ||||
| happened before the CB_WANTS_CANCELLED and is thus covered by it, or after | ||||
| the CB_WANTS_CANCELLED in which case it was not covered by it. The server | ||||
| can make this distinction by putting the appropriate requests into the | ||||
| list of referring calls in the associated CB_SEQUENCE. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="OP_CB_NOTIFY_LOCK" numbered="true" toc="default"> | ||||
| <name>Operation 13: CB_NOTIFY_LOCK - Notify Client of Possible Lock Availability</name> | ||||
| <section toc="exclude" anchor="OP_CB_NOTIFY_LOCK_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_NOTIFY_LOCK4args { | ||||
| nfs_fh4 cnla_fh; | ||||
| lock_owner4 cnla_lock_owner; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_NOTIFY_LOCK_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_NOTIFY_LOCK4res { | ||||
| nfsstat4 cnlr_status; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_NOTIFY_LOCK_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The server can use this operation to indicate that a byte-range lock for the given | ||||
| file and lock-owner, previously requested by the client via an unsuccessful | ||||
| LOCK operation, might be available. | ||||
| </t> | ||||
| <t> | ||||
| This callback is meant to be used by servers to help reduce the latency of | ||||
| blocking locks in the case where they recognize that a client that has | ||||
| been polling for a blocking byte-range lock may now be able to acquire the lock. | ||||
| If the server supports this callback for a given file, it <bcp14>MUST</bcp14> set the | ||||
| OPEN4_RESULT_MAY_NOTIFY_LOCK flag when responding to successful opens | ||||
| for that file. This does not commit the server to the use of CB_NOTIFY_LOCK, | ||||
| but the client may use this as a hint to decide how frequently to poll | ||||
| for locks derived from that open. | ||||
| </t> | ||||
| <t> | ||||
| If an OPEN operation results in an upgrade, in which the stateid returned | ||||
| has an "other" value matching that of a stateid already allocated, with a | ||||
| new "seqid" indicating a change in the lock being represented, then the | ||||
| value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when responding to that new | ||||
| OPEN controls handling from that point going forward. When parallel OPENs | ||||
| are done on the same file and open-owner, the ordering of the "seqid" fields | ||||
| of the returned stateids (subject to wraparound) are to be used to select | ||||
| the controlling value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_NOTIFY_LOCK_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| The server <bcp14>MUST NOT</bcp14> grant the byte-range lock to the client unless and until it | ||||
| receives a LOCK operation from the client. Similarly, the client | ||||
| receiving this callback cannot assume that it now has the lock or that a | ||||
| subsequent LOCK operation for the lock will be successful. | ||||
| </t> | ||||
| <t> | ||||
| The server is not required to implement this callback, and even if it | ||||
| does, it is not required to use it in any particular case. Therefore, the | ||||
| client must still rely on polling for blocking locks, as described in | ||||
| <xref target="blocking_locks" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| Similarly, the client is not required to implement this callback, and even | ||||
| it does, is still free to ignore it. Therefore, the server <bcp14>MUST NOT</bcp14> assume | ||||
| that the client will act based on the callback. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CB_NOTIFY_DEVICEID" numbered="true" toc="default"> | ||||
| <name>Operation 14: CB_NOTIFY_DEVICEID - Notify Client of Device ID Changes</name> | ||||
| <section toc="exclude" anchor="OP_CB_NOTIFY_DEVICEID_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* | ||||
| * Device notification types. | ||||
| */ | ||||
| enum notify_deviceid_type4 { | ||||
| NOTIFY_DEVICEID4_CHANGE = 1, | ||||
| NOTIFY_DEVICEID4_DELETE = 2 | ||||
| }; | ||||
| /* For NOTIFY4_DEVICEID4_DELETE */ | ||||
| struct notify_deviceid_delete4 { | ||||
| layouttype4 ndd_layouttype; | ||||
| deviceid4 ndd_deviceid; | ||||
| }; | ||||
| /* For NOTIFY4_DEVICEID4_CHANGE */ | ||||
| struct notify_deviceid_change4 { | ||||
| layouttype4 ndc_layouttype; | ||||
| deviceid4 ndc_deviceid; | ||||
| bool ndc_immediate; | ||||
| }; | ||||
| struct CB_NOTIFY_DEVICEID4args { | ||||
| notify4 cnda_changes<>; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_NOTIFY_DEVICEID_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| struct CB_NOTIFY_DEVICEID4res { | ||||
| nfsstat4 cndr_status; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_NOTIFY_DEVICEID_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| The CB_NOTIFY_DEVICEID operation is used by the | ||||
| server to send notifications to clients about | ||||
| changes to pNFS device IDs. The registration of | ||||
| device ID notifications is optional and is done via | ||||
| GETDEVICEINFO. These notifications are sent | ||||
| over the backchannel | ||||
| once the original request has been processed | ||||
| on the server. The server will send an array of | ||||
| notifications, cnda_changes, as a list of pairs of | ||||
| bitmaps and values. See <xref target="fattr4" format="default"/> | ||||
| for a description of how NFSv4.1 bitmaps work. | ||||
| </t> | ||||
| <t> | ||||
| As with CB_NOTIFY (<xref target="OP_CB_NOTIFY_DESCRIPTION" format="default"/>), it is | ||||
| possible the server has more notifications than | ||||
| can fit in a CB_COMPOUND, thus requiring multiple | ||||
| CB_COMPOUNDs. Unlike CB_NOTIFY, serialization is not | ||||
| an issue because unlike directory entries, device | ||||
| IDs cannot be re-used after being deleted (<xref target="device_ids" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| All device ID notifications contain a device ID and a | ||||
| layout type. The layout type is necessary because two | ||||
| different layout types can share the same device ID, | ||||
| and the common device ID can have completely different | ||||
| mappings for each layout type. | ||||
| </t> | ||||
| <t> | ||||
| The server will send the following notifications: | ||||
| </t> | ||||
| <dl newline="true" spacing="normal"> | ||||
| <dt>NOTIFY_DEVICEID4_CHANGE</dt> | ||||
| <dd> | ||||
| A previously provided device-ID-to-device-address | ||||
| mapping has changed and the client uses | ||||
| GETDEVICEINFO to obtain the | ||||
| updated mapping. | ||||
| The notification is encoded in a value of data | ||||
| type notify_deviceid_change4. This data type | ||||
| also contains a boolean field, ndc_immediate, | ||||
| which if TRUE indicates that the change will be | ||||
| enforced immediately, and so the client might not | ||||
| be able to complete any pending I/O to the device | ||||
| ID. If ndc_immediate is FALSE, then for an | ||||
| indefinite time, the client can complete pending | ||||
| I/O. After pending I/O is complete, the client | ||||
| <bcp14>SHOULD</bcp14> get the new device-ID-to-device-address | ||||
| mappings before sending new I/O requests to the | ||||
| storage devices addressed by the device ID. | ||||
| </dd> | ||||
| <dt>NOTIFY4_DEVICEID_DELETE</dt> | ||||
| <dd> | ||||
| <t> | ||||
| Deletes a device ID from the mappings. This | ||||
| notification <bcp14>MUST NOT</bcp14> be sent if the client has | ||||
| a layout that refers to the device ID. In other | ||||
| words, if the server is sending a delete device ID | ||||
| notification, one of the following is true for layouts | ||||
| associated with the layout type: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The client never had a layout referring to that device ID. | ||||
| </li> | ||||
| <li> | ||||
| The client has returned all layouts referring to that device ID. | ||||
| </li> | ||||
| <li> | ||||
| The server has revoked all layouts referring to that device ID. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The notification is encoded in a value of data | ||||
| type notify_deviceid_delete4. | ||||
| After a server deletes a device ID, it <bcp14>MUST NOT</bcp14> | ||||
| reuse that device ID for the same layout type until the | ||||
| client ID is deleted. | ||||
| </t> | ||||
| </dd> | ||||
| </dl> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="OP_CB_ILLEGAL" numbered="true" toc="default"> | ||||
| <name>Operation 10044: CB_ILLEGAL - Illegal Callback Operation</name> | ||||
| <section toc="exclude" anchor="OP_CB_ILLEGAL_ARGUMENT" numbered="true"> | ||||
| <name>ARGUMENT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| void; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_ILLEGAL_RESULT" numbered="true"> | ||||
| <name>RESULT</name> | ||||
| <sourcecode type="xdr"><![CDATA[ | ||||
| /* | ||||
| * CB_ILLEGAL: Response for illegal operation numbers | ||||
| */ | ||||
| struct CB_ILLEGAL4res { | ||||
| nfsstat4 status; | ||||
| }; | ||||
| ]]></sourcecode> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_ILLEGAL_DESCRIPTION" numbered="true"> | ||||
| <name>DESCRIPTION</name> | ||||
| <t> | ||||
| This operation is a placeholder for encoding a | ||||
| result to handle the case of the server sending | ||||
| an operation code within CB_COMPOUND that is not | ||||
| defined in the NFSv4.1 specification. See <xref target="OP_CB_COMPOUND_DESCRIPTION" format="default"/> for more details. | ||||
| </t> | ||||
| <t> | ||||
| The status field of CB_ILLEGAL4res <bcp14>MUST</bcp14> be set to | ||||
| NFS4ERR_OP_ILLEGAL. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" anchor="OP_CB_ILLEGAL_IMPLEMENTATION" numbered="true"> | ||||
| <name>IMPLEMENTATION</name> | ||||
| <t> | ||||
| A server will probably not send an operation with code | ||||
| OP_CB_ILLEGAL, but if it does, the response will be CB_ILLEGAL4res | ||||
| just as it would be with any other invalid operation code. Note | ||||
| that if the client gets an illegal operation code that is not | ||||
| OP_ILLEGAL, and if the client checks for legal operation codes | ||||
| during the XDR decode phase, then an instance of | ||||
| data type CB_ILLEGAL4res will not be returned. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| </section> | ||||
| <!-- $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="SECCON" numbered="true" toc="default"> | ||||
| <name>Security Considerations</name> | ||||
| <t> | ||||
| Historically, the authentication model of NFS | ||||
| was based on the entire machine being the NFS client, with the | ||||
| NFS server trusting the NFS client | ||||
| to authenticate the end-user. | ||||
| The NFS server in turn shared its files only to | ||||
| specific clients, as identified by the client's source | ||||
| network address. Given this model, the AUTH_SYS | ||||
| RPC security flavor simply identified the end-user | ||||
| using the client to the NFS server. When processing | ||||
| NFS responses, the client ensured that the responses | ||||
| came from the same network address and port number | ||||
| to which the request was sent. While such a model is | ||||
| easy to implement and simple to deploy and use, it is | ||||
| unsafe. Thus, NFSv4.1 | ||||
| implementations are <bcp14>REQUIRED</bcp14> to support a security model that uses | ||||
| end-to-end authentication, where an end-user on a client | ||||
| mutually authenticates (via cryptographic schemes that | ||||
| do not expose passwords or keys in the clear on the | ||||
| network) to a principal on an NFS server. Consideration | ||||
| is also given to the integrity and privacy of | ||||
| NFS requests and responses. The issues of end-to-end | ||||
| mutual authentication, integrity, and privacy are | ||||
| discussed in <xref target="RPCSEC_GSS_and_Security_Services" format="default"/>. | ||||
| There are specific considerations when using Kerberos V5 as described | ||||
| in <xref target="krb5_sec_consider" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| Note that being <bcp14>REQUIRED</bcp14> to implement does not mean <bcp14>REQUIRED</bcp14> to | ||||
| use; AUTH_SYS can be used by NFSv4.1 clients and servers. | ||||
| However, AUTH_SYS is merely an <bcp14>OPTIONAL</bcp14> security flavor in NFSv4.1, | ||||
| and so interoperability via AUTH_SYS is not assured. | ||||
| </t> | ||||
| <t> | ||||
| For reasons of reduced administration overhead, better | ||||
| performance, and/or reduction of CPU utilization, | ||||
| users of NFSv4.1 implementations might decline to use | ||||
| security mechanisms that enable integrity protection | ||||
| on each remote procedure call and response. The | ||||
| use of mechanisms without integrity leaves the user | ||||
| vulnerable to a man-in-the-middle of the NFS | ||||
| client and server that modifies the RPC request and/or | ||||
| the response. While implementations are free to provide | ||||
| the option to use weaker security mechanisms, there | ||||
| are three operations in particular that warrant the | ||||
| implementation overriding user choices. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The first two such operations are SECINFO and | ||||
| SECINFO_NO_NAME. It is <bcp14>RECOMMENDED</bcp14> that the client send | ||||
| both operations such that they are protected with a | ||||
| security flavor that has integrity protection, such | ||||
| as RPCSEC_GSS with either the rpc_gss_svc_integrity | ||||
| or rpc_gss_svc_privacy service. Without integrity | ||||
| protection encapsulating SECINFO and SECINFO_NO_NAME | ||||
| and their results, a man-in-the-middle could | ||||
| modify results such that the client might select a | ||||
| weaker algorithm in the set allowed by the server, making | ||||
| the client and/or server vulnerable to further attacks. | ||||
| </li> | ||||
| <li> | ||||
| The third operation that <bcp14>SHOULD</bcp14> use integrity protection | ||||
| is any GETATTR for the fs_locations and fs_locations_info attributes, | ||||
| in order to mitigate the severity of a man-in-the-middle attack. | ||||
| The attack has two | ||||
| steps. First the attacker modifies the unprotected results of some | ||||
| operation to return NFS4ERR_MOVED. Second, when the client follows up | ||||
| with a GETATTR for the fs_locations or fs_locations_info attributes, | ||||
| the attacker modifies | ||||
| the results to cause the client to migrate its traffic to a server | ||||
| controlled by the attacker. With integrity protection, this attack is mitigated. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Relative to previous NFS versions, NFSv4.1 has additional security | ||||
| considerations for pNFS (see Sections <xref target="security_considerations_pnfs" format="counter"/> | ||||
| and <xref target="file_security_considerations" format="counter"/>), locking | ||||
| and session state (see <xref target="protect_state_change" format="default"/>), | ||||
| and state recovery during grace period (see <xref target="reclaim_security_considerations" format="default"/>). | ||||
| With respect to locking and session state, if SP4_SSV state protection | ||||
| is being used, <xref target="rpcsec_ssv_consider" format="default"/> has specific | ||||
| security considerations for the NFSv4.1 client and server. | ||||
| </t> | ||||
| <t> | ||||
| Security considerations for lock reclaim differ between the two different | ||||
| situations in which state reclaim is to be done. | ||||
| The server failure situation is discussed in | ||||
| <xref target="reclaim_security_considerations" format="default"/>, while the per-fs state | ||||
| reclaim done in support of migration/replication is discussed in | ||||
| <xref target="SEC11-EFF-lock-sc" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| The use of the multi-server namespace features described in | ||||
| <xref target="NEW11" format="default"/> raises | ||||
| the possibility that requests to determine the set of network | ||||
| addresses corresponding to a given server might be interfered | ||||
| with or have their responses modified in flight. | ||||
| In light of this possibility, the following considerations | ||||
| should be noted: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| When DNS is used to convert server names to addresses and | ||||
| DNSSEC <xref target="RFC4033" format="default"/> is not available, the validity of | ||||
| the network addresses returned generally cannot be relied upon. | ||||
| However, when combined with a trusted resolver, DNS over TLS | ||||
| <xref target="RFC7858" format="default"/> and DNS over HTTPS | ||||
| <xref target="RFC8484" format="default"/> can be relied upon to provide | ||||
| valid address resolutions. | ||||
| </t> | ||||
| <t> | ||||
| In situations in which the validity of the provided addresses | ||||
| cannot be relied upon and the client uses RPCSEC_GSS to access the | ||||
| designated server, it is possible for mutual authentication to | ||||
| discover invalid server addresses as long as the RPCSEC_GSS | ||||
| implementation used does not use insecure DNS queries to canonicalize | ||||
| the hostname components of the service principal names, as | ||||
| explained in <xref target="RFC4120" format="default"/>. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| The fetching of attributes containing file system location | ||||
| information <bcp14>SHOULD</bcp14> be | ||||
| performed using integrity protection. It is important to note here that | ||||
| a client making a request of this sort without using | ||||
| integrity protection needs be aware of | ||||
| the negative consequences of doing so, which can lead to | ||||
| invalid hostnames or network addresses being returned. These | ||||
| include cases in which the | ||||
| client is directed to a server under the control of an | ||||
| attacker, who might get access to data written or provide | ||||
| incorrect values for data read. In light of | ||||
| this, the client needs to recognize that using such returned | ||||
| location information to access an NFSv4 server | ||||
| without use of RPCSEC_GSS (i.e., | ||||
| by using AUTH_SYS) poses dangers as it can result in the client | ||||
| interacting with such an attacker-controlled server without | ||||
| any authentication facilities to verify the server's identity. | ||||
| </li> | ||||
| <li> | ||||
| Despite the fact that it is a requirement that implementations provide | ||||
| "support" for use of RPCSEC_GSS, it cannot be assumed that | ||||
| use of RPCSEC_GSS is always available between any particular | ||||
| client-server pair. | ||||
| </li> | ||||
| <li> | ||||
| When a client has the network addresses of a server but not the | ||||
| associated hostnames, that would interfere with its ability | ||||
| to use RPCSEC_GSS. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| In light of the above, a server <bcp14>SHOULD</bcp14> present file system location | ||||
| entries that correspond to file systems on other servers using a | ||||
| hostname. This would allow the client to interrogate the | ||||
| fs_locations on the destination server to obtain trunking information | ||||
| (as well as replica information) using integrity protection, | ||||
| validating the name provided while assuring that the response has | ||||
| not been modified in flight. | ||||
| </t> | ||||
| <t> | ||||
| When RPCSEC_GSS is not available on a server, the client needs | ||||
| to be aware of the fact that the location entries are subject to | ||||
| modification in flight and so cannot be relied upon. | ||||
| In the case of a client being directed to another server after NFS4ERR_MOVED, | ||||
| this could vitiate the | ||||
| authentication provided by the use of RPCSEC_GSS on the designated | ||||
| destination server. Even when RPCSEC_GSS authentication is available | ||||
| on the destination, the server might still properly authenticate as the | ||||
| server to which the client was erroneously directed. | ||||
| Without a way to decide whether | ||||
| the server is a valid one, the client can only determine, using | ||||
| RPCSEC_GSS, that the server corresponds to the name provided, with | ||||
| no basis for trusting that server. As a result, the client <bcp14>SHOULD | ||||
| NOT</bcp14> use such unverified location entries as a basis for migration, | ||||
| even though RPCSEC_GSS might be available on the destination. | ||||
| </t> | ||||
| <t> | ||||
| When a file system location attribute is fetched upon connecting with an | ||||
| NFS server, it <bcp14>SHOULD</bcp14>, as stated above, be done with integrity protection. | ||||
| When this not possible, it is generally | ||||
| best for the client to ignore trunking and replica information or | ||||
| simply not fetch the location information for these purposes. | ||||
| </t> | ||||
| <t> | ||||
| When location information cannot be verified, it can be subjected | ||||
| to additional filtering to prevent the client from being | ||||
| inappropriately directed. For example, if a range of network | ||||
| addresses can be determined that assure that the servers and | ||||
| clients using AUTH_SYS are subject to the appropriate set of | ||||
| constraints (e.g., physical network isolation, administrative | ||||
| controls on the operating systems used), then network addresses | ||||
| in the appropriate range can be used with others discarded | ||||
| or restricted in their use of AUTH_SYS. | ||||
| </t> | ||||
| <t> | ||||
| To summarize considerations regarding the use of RPCSEC_GSS in | ||||
| fetching location information, we need to consider the following | ||||
| possibilities for requests to interrogate location information, with | ||||
| interrogation approaches on the referring and destination servers | ||||
| arrived at separately: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The use of integrity protection is <bcp14>RECOMMENDED</bcp14> | ||||
| in all cases, since the absence of integrity protection exposes | ||||
| the client to the possibility of the results being modified in transit. | ||||
| </li> | ||||
| <li> | ||||
| The use of requests issued without RPCSEC_GSS | ||||
| (i.e., using AUTH_SYS, which has no provision to avoid | ||||
| modification of data in flight), | ||||
| while undesirable and a potential security exposure, | ||||
| may not be avoidable in all cases. Where the use | ||||
| of the returned information cannot be avoided, it is made | ||||
| subject to filtering as described above to | ||||
| eliminate the possibility that the client would | ||||
| treat an invalid address as if it were a NFSv4 server. The | ||||
| specifics will vary depending on the degree of network isolation | ||||
| and whether the request is to the referring or destination servers. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Even if such requests are not interfered with in flight, it is possible | ||||
| for a compromised server to direct the client to use inappropriate servers, | ||||
| such as those under the control of the attacker. It is not clear that being | ||||
| directed to such servers represents a greater threat to the client than the | ||||
| damage that could be done by the compromised server itself. However, it | ||||
| is possible that some sorts of transient server compromises might be | ||||
| exploited to direct a client to a server capable of doing greater | ||||
| damage over a longer time. One useful step to guard against this | ||||
| possibility is to issue requests to fetch location data using RPCSEC_GSS, | ||||
| even if no mapping to an RPCSEC_GSS principal is available. In this case, | ||||
| RPCSEC_GSS would not be used, as it typically is, to identify the client | ||||
| principal to the server, but rather to make sure (via RPCSEC_GSS mutual | ||||
| authentication) that the server being contacted is the one intended. | ||||
| </t> | ||||
| <t> | ||||
| Similar considerations apply if the threat to be avoided is the redirection | ||||
| of client traffic to inappropriate (i.e., poorly performing) servers. In | ||||
| both cases, there is no reason for the information returned to depend on | ||||
| the identity of the client principal requesting it, while the validity of the | ||||
| server information, which has the capability to affect all client principals, | ||||
| is of considerable importance. | ||||
| </t> | ||||
| </section> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="ianaconsider" numbered="true" toc="default"> | ||||
| <name>IANA Considerations</name> | ||||
| <t> | ||||
| This section uses terms that are defined in <xref target="RFC8126" format="default"/>. | ||||
| </t> | ||||
| <section anchor="Iana-actions" numbered="true" toc="default"> | ||||
| <name>IANA Actions</name> | ||||
| <t> | ||||
| This update does not require any modification of, or additions to, registry | ||||
| entries or registry rules associated with NFSv4.1. However, since | ||||
| this document obsoletes RFC 8881, IANA has updated all registry entries and registry rules references | ||||
| that point to RFC 5661 to point to this document instead. | ||||
| </t> | ||||
| <t> | ||||
| Previous actions by IANA related to NFSv4.1 are listed in the remaining | ||||
| subsections of <xref target="ianaconsider" format="default"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="namedattributesiana" numbered="true" toc="default"> | ||||
| <name>Named Attribute Definitions</name> | ||||
| <t> | ||||
| IANA created a registry called the "NFSv4 Named Attribute Definitions Registry". | ||||
| </t> | ||||
| <t> | ||||
| The NFSv4.1 protocol supports the association of a file with zero or | ||||
| more named attributes. The namespace identifiers for these attributes | ||||
| are defined as string names. The protocol does not define the | ||||
| specific assignment of the namespace for these file attributes. | ||||
| The IANA registry promotes interoperability where common interests exist. | ||||
| While application developers are allowed to define and use | ||||
| attributes as needed, they are encouraged to register the | ||||
| attributes with IANA. | ||||
| </t> | ||||
| <t> | ||||
| Such registered named attributes are presumed to apply to all minor | ||||
| versions of NFSv4, including those defined subsequently to the | ||||
| registration. If the named attribute is intended to be | ||||
| limited to specific minor versions, this will be clearly stated in | ||||
| the registry's assignment. | ||||
| </t> | ||||
| <t> | ||||
| All assignments to the registry are made on a First Come First Served basis, | ||||
| per <xref target="RFC8126" sectionFormat="of" section="4.4"/>. | ||||
| The policy for each assignment is Specification Required, | ||||
| per <xref target="RFC8126" sectionFormat="of" section="4.6"/>. | ||||
| </t> | ||||
| <t> | ||||
| Under the NFSv4.1 specification, the name of a named | ||||
| attribute can in theory be up to 2<sup>32</sup> - 1 bytes in | ||||
| length, but in practice NFSv4.1 clients and servers | ||||
| will be unable to handle a string that long. IANA | ||||
| should reject any assignment request with a named | ||||
| attribute that exceeds 128 UTF-8 characters. To give the | ||||
| IESG the flexibility to set up bases of assignment of | ||||
| Experimental Use and Standards Action, | ||||
| the prefixes of "EXPE" and "STDS" are Reserved. | ||||
| The named attribute with a zero-length name is Reserved. | ||||
| </t> | ||||
| <t> | ||||
| The prefix "PRIV" is designated for Private Use. A | ||||
| site that wants to make use of unregistered named | ||||
| attributes without risk of conflicting with an | ||||
| assignment in IANA's registry should use the prefix | ||||
| "PRIV" in all of its named attributes. | ||||
| </t> | ||||
| <t> | ||||
| Because some NFSv4.1 clients and servers have case-insensitive | ||||
| semantics, the fifteen additional lower case and mixed case | ||||
| permutations of each of "EXPE", "PRIV", and "STDS" are Reserved (e.g., | ||||
| "expe", "expE", "exPe", etc. are Reserved). | ||||
| Similarly, IANA must not allow two assignments that would conflict | ||||
| if both named attributes were converted to a common case. | ||||
| </t> | ||||
| <t> | ||||
| The registry of named attributes is a list of assignments, each | ||||
| containing three fields for each assignment. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| A US-ASCII string name that is the actual name of | ||||
| the attribute. This name must be unique. This | ||||
| string name can be 1 to 128 UTF-8 characters | ||||
| long. | ||||
| </li> | ||||
| <li> | ||||
| A reference to the specification of the named attribute. | ||||
| The reference can consume up to 256 bytes (or more if IANA | ||||
| permits). | ||||
| </li> | ||||
| <li> | ||||
| The point of contact of the registrant. The point | ||||
| of contact can consume up to 256 bytes (or more if IANA | ||||
| permits). | ||||
| </li> | ||||
| </ol> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Initial Registry</name> | ||||
| <t> | ||||
| There is no initial registry. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Updating Registrations</name> | ||||
| <t> | ||||
| The registrant is always permitted to update the point of contact | ||||
| field. Any other change will require Expert Review or IESG | ||||
| Approval. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="notifyiana" numbered="true" toc="default"> | ||||
| <name>Device ID Notifications</name> | ||||
| <t> | ||||
| IANA created a registry called the "NFSv4 Device ID | ||||
| Notifications Registry". | ||||
| </t> | ||||
| <t> | ||||
| The potential exists for new notification types to be | ||||
| added to the CB_NOTIFY_DEVICEID operation (see <xref target="OP_CB_NOTIFY_DEVICEID" format="default"/>). This can be done | ||||
| via changes to the operations that register | ||||
| notifications, or by adding new operations to NFSv4. | ||||
| This requires a new minor version of NFSv4, and | ||||
| requires a Standards Track document from the IETF. | ||||
| Another way to add a notification is to specify a new | ||||
| layout type (see <xref target="pnfsiana" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| Hence, all assignments to the registry are made on a Standards Action | ||||
| basis per <xref target="RFC8126" section="4.6" sectionFormat="of" format="default"/>, with | ||||
| Expert Review required. | ||||
| </t> | ||||
| <t> | ||||
| The registry is a list of assignments, each containing | ||||
| five fields per assignment. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| The name of the notification type. This name must have the | ||||
| prefix "NOTIFY_DEVICEID4_". This name must be unique. | ||||
| </li> | ||||
| <li> | ||||
| The value of the notification. IANA will assign | ||||
| this number, and the request from the registrant | ||||
| will use TBD1 instead of an actual value. IANA | ||||
| <bcp14>MUST</bcp14> use a whole number that can be no higher | ||||
| than 2<sup>32</sup>-1, and should be the next available | ||||
| value. The value assigned must be unique. | ||||
| A Designated Expert must be used to | ||||
| ensure that when the name of the notification | ||||
| type and its value are added to the NFSv4.1 | ||||
| notify_deviceid_type4 enumerated data type in the | ||||
| NFSv4.1 XDR description <xref target="RFC5662" format="default"/>, the result continues to | ||||
| be a valid XDR description. | ||||
| </li> | ||||
| <li> | ||||
| The Standards Track RFC(s) that describe the | ||||
| notification. If the RFC(s) have not yet been | ||||
| published, the registrant will use RFCTBD2, RFCTBD3, etc. instead | ||||
| of an actual RFC number. | ||||
| </li> | ||||
| <li> | ||||
| How the RFC introduces the notification. This is | ||||
| indicated by a single US-ASCII value. If the | ||||
| value is N, it means a minor revision to the | ||||
| NFSv4 protocol. If the value is L, it means a new | ||||
| pNFS layout type. Other values can be used with | ||||
| IESG Approval. | ||||
| </li> | ||||
| <li> | ||||
| The minor versions of NFSv4 that are allowed to | ||||
| use the notification. While these are numeric | ||||
| values, IANA will not allocate and assign them; | ||||
| the author of the relevant RFCs with IESG | ||||
| Approval assigns these numbers. Each time there is a | ||||
| new minor version of NFSv4 approved, a Designated | ||||
| Expert should review the registry to make recommended | ||||
| updates as needed. | ||||
| </li> | ||||
| </ol> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Initial Registry</name> | ||||
| <t> | ||||
| The initial registry is in <xref target="devnotelist" format="default"/>. Note that the | ||||
| next available value is zero. | ||||
| </t> | ||||
| <table anchor="devnotelist" align="center"> | ||||
| <name>Initial Device ID Notification Assignments</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Notification Name</th> | ||||
| <th align="left">Value</th> | ||||
| <th align="left">RFC</th> | ||||
| <th align="left">How</th> | ||||
| <th align="left">Minor Versions</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">NOTIFY_DEVICEID4_CHANGE</td> | ||||
| <td align="left">1</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">N</td> | ||||
| <td align="left">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">NOTIFY_DEVICEID4_DELETE</td> | ||||
| <td align="left">2</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">N</td> | ||||
| <td align="left">1</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Updating Registrations</name> | ||||
| <t> | ||||
| The update of a registration will require IESG | ||||
| Approval on the advice of a Designated Expert. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="recalliana" numbered="true" toc="default"> | ||||
| <name>Object Recall Types</name> | ||||
| <t> | ||||
| IANA created a registry called the "NFSv4 Recallable Object Types Registry". | ||||
| </t> | ||||
| <t> | ||||
| The potential exists for new object types to be added to the CB_RECALL_ANY operation (see | ||||
| <xref target="OP_CB_RECALL_ANY" format="default"/>). This can be done via changes to | ||||
| the operations that add recallable types, or by adding new operations | ||||
| to NFSv4. This requires a new minor version of NFSv4, and requires | ||||
| a Standards Track document from IETF. Another way to | ||||
| add a new recallable object is to specify a new layout type (see <xref target="pnfsiana" format="default"/>). | ||||
| </t> | ||||
| <t> | ||||
| All assignments to the registry are made on a Standards Action | ||||
| basis per <xref target="RFC8126" sectionFormat="of" section="4.9"/>, with | ||||
| Expert Review required. | ||||
| </t> | ||||
| <t> | ||||
| Recallable object types are 32-bit unsigned numbers. There are no Reserved | ||||
| values. Values in the range 12 through 15, inclusive, are designated for Private | ||||
| Use. | ||||
| </t> | ||||
| <t> | ||||
| The registry is a list of assignments, each containing | ||||
| five fields per assignment. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| The name of the recallable object type. This name must have the | ||||
| prefix "RCA4_TYPE_MASK_". The name must be unique. | ||||
| </li> | ||||
| <li> | ||||
| The value of the recallable object type. IANA | ||||
| will assign this number, and the request from the | ||||
| registrant will use TBD1 instead of an actual | ||||
| value. IANA <bcp14>MUST</bcp14> use a whole number that can be | ||||
| no higher than 2<sup>32</sup>-1, and should be the next | ||||
| available value. The value must be unique. A | ||||
| Designated Expert must be used to ensure that | ||||
| when the name of the recallable type and its | ||||
| value are added to the NFSv4 XDR description | ||||
| <xref target="RFC5662" format="default"/>, | ||||
| the result continues to be a valid XDR | ||||
| description. | ||||
| </li> | ||||
| <li> | ||||
| The Standards Track RFC(s) that describe the | ||||
| recallable object type. If the RFC(s) have not yet been | ||||
| published, the registrant will use RFCTBD2, RFCTBD3, etc. instead | ||||
| of an actual RFC number. | ||||
| </li> | ||||
| <li> | ||||
| How the RFC introduces the recallable object type. This is | ||||
| indicated by a single US-ASCII value. If the | ||||
| value is N, it means a minor revision to the | ||||
| NFSv4 protocol. If the value is L, it means a new | ||||
| pNFS layout type. Other values can be used with | ||||
| IESG Approval. | ||||
| </li> | ||||
| <li> | ||||
| The minor versions of NFSv4 that are allowed to | ||||
| use the recallable object type. While these | ||||
| are numeric values, IANA will not allocate and | ||||
| assign them; the author of the relevant RFCs with | ||||
| IESG Approval assigns these numbers. Each time | ||||
| there is a new minor version of NFSv4 approved, a | ||||
| Designated Expert should review the registry to | ||||
| make recommended updates as needed. | ||||
| </li> | ||||
| </ol> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Initial Registry</name> | ||||
| <t> | ||||
| The initial registry is in <xref target="recalllist" format="default"/>. Note that | ||||
| the next available value is five. | ||||
| </t> | ||||
| <table anchor="recalllist" align="center"> | ||||
| <name>Initial Recallable Object Type Assignments</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Recallable Object Type Name</th> | ||||
| <th align="left">Value</th> | ||||
| <th align="left">RFC</th> | ||||
| <th align="left">How</th> | ||||
| <th align="left">Minor Versions</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">RCA4_TYPE_MASK_RDATA_DLG</td> | ||||
| <td align="left">0</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">N</td> | ||||
| <td align="left">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">RCA4_TYPE_MASK_WDATA_DLG</td> | ||||
| <td align="left">1</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">N</td> | ||||
| <td align="left">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">RCA4_TYPE_MASK_DIR_DLG</td> | ||||
| <td align="left">2</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">N</td> | ||||
| <td align="left">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">RCA4_TYPE_MASK_FILE_LAYOUT</td> | ||||
| <td align="left">3</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">N</td> | ||||
| <td align="left">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">RCA4_TYPE_MASK_BLK_LAYOUT</td> | ||||
| <td align="left">4</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">L</td> | ||||
| <td align="left">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">RCA4_TYPE_MASK_OBJ_LAYOUT_MIN</td> | ||||
| <td align="left">8</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">L</td> | ||||
| <td align="left">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">RCA4_TYPE_MASK_OBJ_LAYOUT_MAX</td> | ||||
| <td align="left">9</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">L</td> | ||||
| <td align="left">1</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Updating Registrations</name> | ||||
| <t> | ||||
| The update of a registration will require IESG | ||||
| Approval on the advice of a Designated Expert. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="pnfsiana" numbered="true" toc="default"> | ||||
| <name>Layout Types</name> | ||||
| <t> | ||||
| IANA created a registry called the "pNFS Layout Types Registry". | ||||
| </t> | ||||
| <t> | ||||
| All assignments to the registry are made on a Standards Action basis, | ||||
| with Expert Review required. | ||||
| </t> | ||||
| <t> | ||||
| Layout types are 32-bit numbers. The value zero is Reserved. | ||||
| Values in the range 0x80000000 to 0xFFFFFFFF inclusive are designated for Private Use. | ||||
| IANA will assign numbers from the range | ||||
| 0x00000001 to 0x7FFFFFFF inclusive. | ||||
| </t> | ||||
| <t> | ||||
| The registry is a list of assignments, each | ||||
| containing five fields. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| The name of the layout type. This name must have the | ||||
| prefix "LAYOUT4_". The name must be unique. | ||||
| </li> | ||||
| <li> | ||||
| The value of the layout type. IANA will assign | ||||
| this number, and the request from the registrant | ||||
| will use TBD1 instead of an actual value. The value | ||||
| assigned must be unique. | ||||
| A Designated Expert must be used to ensure | ||||
| that when the name of the layout type and | ||||
| its value are added to the NFSv4.1 layouttype4 | ||||
| enumerated data type in the NFSv4.1 XDR | ||||
| description <xref target="RFC5662" format="default"/>, | ||||
| the result continues to be a valid XDR | ||||
| description. | ||||
| </li> | ||||
| <li> | ||||
| The Standards Track RFC(s) that describe the | ||||
| notification. If the RFC(s) have not yet been | ||||
| published, the registrant will use RFCTBD2, RFCTBD3, etc. instead | ||||
| of an actual RFC number. Collectively, the RFC(s) must adhere to | ||||
| the guidelines listed in <xref target="layout_guidelines" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| How the RFC introduces the layout type. This is | ||||
| indicated by a single US-ASCII value. If the | ||||
| value is N, it means a minor revision to the | ||||
| NFSv4 protocol. If the value is L, it means a new | ||||
| pNFS layout type. Other values can be used with | ||||
| IESG Approval. | ||||
| </li> | ||||
| <li> | ||||
| The minor versions of NFSv4 that are allowed to | ||||
| use the notification. While these are numeric | ||||
| values, IANA will not allocate and assign them; | ||||
| the author of the relevant RFCs with IESG | ||||
| Approval assigns these numbers. Each time there is | ||||
| a new minor version of NFSv4 approved, a Designated | ||||
| Expert should review the registry to make recommended | ||||
| updates as needed. | ||||
| </li> | ||||
| </ol> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Initial Registry</name> | ||||
| <t> | ||||
| The initial registry is in <xref target="layoutlist" format="default"/>. | ||||
| </t> | ||||
| <table anchor="layoutlist" align="center"> | ||||
| <name>Initial Layout Type Assignments</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Layout Type Name</th> | ||||
| <th align="left">Value</th> | ||||
| <th align="left">RFC</th> | ||||
| <th align="left">How</th> | ||||
| <th align="left">Minor Versions</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">LAYOUT4_NFSV4_1_FILES</td> | ||||
| <td align="left">0x1</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">N</td> | ||||
| <td align="left">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">LAYOUT4_OSD2_OBJECTS</td> | ||||
| <td align="left">0x2</td> | ||||
| <td align="left">RFC 5664</td> | ||||
| <td align="left">L</td> | ||||
| <td align="left">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">LAYOUT4_BLOCK_VOLUME</td> | ||||
| <td align="left">0x3</td> | ||||
| <td align="left">RFC 5663</td> | ||||
| <td align="left">L</td> | ||||
| <td align="left">1</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Updating Registrations</name> | ||||
| <t> | ||||
| The update of a registration will require IESG | ||||
| Approval on the advice of a Designated Expert. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="layout_guidelines" numbered="true" toc="default"> | ||||
| <name>Guidelines for Writing Layout Type Specifications</name> | ||||
| <t> | ||||
| The author of a new pNFS layout specification must follow these | ||||
| steps to obtain acceptance of the layout type as a Standards Track RFC: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| The author devises the new layout specification. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The new layout type specification <bcp14>MUST</bcp14>, at a minimum: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| Define the contents of the layout-type-specific fields of the | ||||
| following data types: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| the da_addr_body field of the device_addr4 | ||||
| data type; | ||||
| </li> | ||||
| <li> | ||||
| the loh_body field of the layouthint4 | ||||
| data type; | ||||
| </li> | ||||
| <li> | ||||
| the loc_body field of layout_content4 | ||||
| data type (which in turn is the lo_content field of the | ||||
| layout4 data type); | ||||
| </li> | ||||
| <li> | ||||
| the lou_body field of the layoutupdate4 | ||||
| data type; | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| Describe or define the storage access protocol used to access | ||||
| the storage devices. | ||||
| </li> | ||||
| <li> | ||||
| Describe whether revocation of layouts is supported. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| At a minimum, describe the methods of recovery from: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> Failure and restart for client, server, storage device. | ||||
| </li> | ||||
| <li> Lease expiration from perspective of the active client, | ||||
| server, storage device. | ||||
| </li> | ||||
| <li> Loss of layout state resulting in fencing of client | ||||
| access to storage devices (for an example, see | ||||
| <xref target="lease_expiration_mds" format="default"/>). | ||||
| </li> | ||||
| </ol> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| Include an IANA considerations section, which will | ||||
| in turn include: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| A request to IANA | ||||
| for a new layout type per <xref target="pnfsiana" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| A list of requests to IANA for | ||||
| any new recallable object types for | ||||
| CB_RECALL_ANY; each entry is to be presented in the form described | ||||
| in <xref target="recalliana" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| A list of requests to IANA for | ||||
| any new notification values for | ||||
| CB_NOTIFY_DEVICEID; each entry is to be presented in the form | ||||
| described in <xref target="notifyiana" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| Include a security considerations section. This section <bcp14>MUST</bcp14> | ||||
| explain how the NFSv4.1 authentication, authorization, and | ||||
| access-control models are preserved. That is, if a metadata server | ||||
| would restrict a READ or WRITE operation, how would pNFS via | ||||
| the layout similarly restrict a corresponding input or | ||||
| output operation? | ||||
| </li> | ||||
| </ul> | ||||
| </li> | ||||
| <li> | ||||
| The author documents the new layout specification as an Internet-Draft. | ||||
| </li> | ||||
| <li> | ||||
| The author submits the Internet-Draft for review through the | ||||
| IETF standards process as defined in "The Internet Standards | ||||
| Process--Revision 3" (BCP 9). | ||||
| The new layout specification will be | ||||
| submitted for eventual publication as a Standards Track RFC. | ||||
| </li> | ||||
| <li> | ||||
| The layout specification progresses through the IETF standards | ||||
| process. | ||||
| </li> | ||||
| </ol> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="path_var_iana" numbered="true" toc="default"> | ||||
| <name>Path Variable Definitions</name> | ||||
| <t> | ||||
| This section deals with the IANA considerations associated with | ||||
| the variable substitution feature for location names as | ||||
| described in <xref target="SEC11-fsli-item" format="default"/>. As | ||||
| described there, variables subject to substitution consist | ||||
| of a domain name and a specific name within that domain, with the | ||||
| two separated by a colon. There are two sets of IANA considerations | ||||
| here: | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| The list of variable names. | ||||
| </li> | ||||
| <li> | ||||
| For each variable name, the list of possible values. | ||||
| </li> | ||||
| </ol> | ||||
| <t> | ||||
| Thus, there will be one registry for the list of variable names, and | ||||
| possibly one registry for listing the values of each variable name. | ||||
| </t> | ||||
| <section anchor="path_variables_iana" numbered="true" toc="default"> | ||||
| <name>Path Variables Registry</name> | ||||
| <t> | ||||
| IANA created a registry called the "NFSv4 Path Variables Registry". | ||||
| </t> | ||||
| <section anchor="path_values_iana" numbered="true" toc="default"> | ||||
| <name>Path Variable Values</name> | ||||
| <t> | ||||
| Variable names are of the form "${", followed by a | ||||
| domain name, followed by a colon (":"), followed by | ||||
| a domain-specific portion of the variable name, | ||||
| followed by "}". When the domain name is "ietf.org", | ||||
| all variables names must be registered with IANA on | ||||
| a Standards Action basis, with Expert Review | ||||
| required. Path variables with registered domain | ||||
| names neither part of nor equal to ietf.org are | ||||
| assigned on a Hierarchical Allocation basis | ||||
| (delegating to the domain owner) and thus of no | ||||
| concern to IANA, unless the domain owner chooses to | ||||
| register a variable name from his domain. If the | ||||
| domain owner chooses to do so, IANA will do so on a | ||||
| First Come First Serve basis. To accommodate | ||||
| registrants who do not have their own domain, IANA | ||||
| will accept requests to register variables with the | ||||
| prefix "${FCFS.ietf.org:" on a First Come First | ||||
| Served basis. Assignments on a First Come First Basis | ||||
| do not require Expert Review, unless the registrant also | ||||
| wants IANA to establish a registry for the values of the | ||||
| registered variable. | ||||
| </t> | ||||
| <t> | ||||
| The registry is a list of assignments, each | ||||
| containing three fields. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| The name of the variable. The name of this | ||||
| variable must start with a "${" followed by a | ||||
| registered domain name, followed by ":", or it | ||||
| must start with "${FCFS.ietf.org". The name must | ||||
| be no more than 64 UTF-8 characters long. The | ||||
| name must be unique. | ||||
| </li> | ||||
| <li> | ||||
| For assignments made on Standards Action basis, | ||||
| the Standards Track RFC(s) that describe the | ||||
| variable. If the RFC(s) have not yet been | ||||
| published, the registrant will use RFCTBD1, | ||||
| RFCTBD2, etc. instead of an actual RFC number. | ||||
| Note that the RFCs do not have to be a part of an NFS minor version. | ||||
| For assignments made on a First Come First Serve basis, an explanation | ||||
| (consuming no more than 1024 bytes, or more if IANA permits) | ||||
| of the purpose of the variable. A reference to the explanation can | ||||
| be substituted. | ||||
| </li> | ||||
| <li> | ||||
| The point of contact, including an email address. The point of | ||||
| contact can consume up to 256 bytes (or more if IANA permits). | ||||
| For assignments made on a Standards Action basis, the point of | ||||
| contact is always IESG. | ||||
| </li> | ||||
| </ol> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Initial Registry</name> | ||||
| <t> | ||||
| The initial registry is in <xref target="varlist" format="default"/>. | ||||
| </t> | ||||
| <table anchor="varlist" align="center"> | ||||
| <name>Initial List of Path Variables</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="left">Variable Name</th> | ||||
| <th align="left">RFC</th> | ||||
| <th align="left">Point of Contact</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="left">${ietf.org:CPU_ARCH}</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">IESG</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">${ietf.org:OS_TYPE}</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">IESG</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="left">${ietf.org:OS_VERSION}</td> | ||||
| <td align="left">RFC 8881</td> | ||||
| <td align="left">IESG</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> | ||||
| IANA has created registries for the values | ||||
| of the variable names ${ietf.org:CPU_ARCH} and | ||||
| ${ietf.org:OS_TYPE}. See Sections <xref target="cpu_arch" format="counter"/> | ||||
| and <xref target="os_type" format="counter"/>. | ||||
| </t> | ||||
| <t> | ||||
| For the values of the variable | ||||
| ${ietf.org:OS_VERSION}, no registry is needed as | ||||
| the specifics of the values of the variable will | ||||
| vary with the value of ${ietf.org:OS_TYPE}. Thus, | ||||
| values for ${ietf.org:OS_VERSION} are on a | ||||
| Hierarchical Allocation basis and are of no concern | ||||
| to IANA. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Updating Registrations</name> | ||||
| <t> | ||||
| The update of an assignment made on a Standards Action basis | ||||
| will require IESG Approval on the advice of a Designated Expert. | ||||
| </t> | ||||
| <t> | ||||
| The registrant can always update the point of contact of an assignment | ||||
| made on a First Come First Serve basis. Any other update will require | ||||
| Expert Review. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="cpu_arch" numbered="true" toc="default"> | ||||
| <name>Values for the ${ietf.org:CPU_ARCH} Variable</name> | ||||
| <t> | ||||
| IANA created a registry called the "NFSv4 ${ietf.org:CPU_ARCH} Value Registry". | ||||
| </t> | ||||
| <t> | ||||
| Assignments to the registry are made on a First Come First Serve | ||||
| basis. The zero-length value of ${ietf.org:CPU_ARCH} is Reserved. | ||||
| Values with a prefix of "PRIV" are designated for Private Use. | ||||
| </t> | ||||
| <t> | ||||
| The registry is a list of assignments, each | ||||
| containing three fields. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| A value of the ${ietf.org:CPU_ARCH} variable. The value | ||||
| must be 1 to 32 UTF-8 characters long. The value must be unique. | ||||
| </li> | ||||
| <li> | ||||
| An explanation (consuming no more than 1024 | ||||
| bytes, or more if IANA permits) of what CPU | ||||
| architecture the value denotes. A reference to | ||||
| the explanation can be substituted. | ||||
| </li> | ||||
| <li> | ||||
| The point of contact, including an email address. The point of | ||||
| contact can consume up to 256 bytes (or more if IANA permits). | ||||
| </li> | ||||
| </ol> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Initial Registry</name> | ||||
| <t> | ||||
| There is no initial registry. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Updating Registrations</name> | ||||
| <t> | ||||
| The registrant is free to update the assignment, i.e., change the | ||||
| explanation and/or point-of-contact fields. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="os_type" numbered="true" toc="default"> | ||||
| <name>Values for the ${ietf.org:OS_TYPE} Variable</name> | ||||
| <t> | ||||
| IANA created a registry called the "NFSv4 ${ietf.org:OS_TYPE} Value Registry". | ||||
| </t> | ||||
| <t> | ||||
| Assignments to the registry are made on a First Come First Serve | ||||
| basis. The zero-length value of ${ietf.org:OS_TYPE} is Reserved. | ||||
| Values with a prefix of "PRIV" are designated for Private Use. | ||||
| </t> | ||||
| <t> | ||||
| The registry is a list of assignments, each | ||||
| containing three fields. | ||||
| </t> | ||||
| <ol spacing="normal" type="1"> | ||||
| <li> | ||||
| A value of the ${ietf.org:OS_TYPE} variable. The value | ||||
| must be 1 to 32 UTF-8 characters long. The value must be unique. | ||||
| </li> | ||||
| <li> | ||||
| An explanation (consuming no more than 1024 | ||||
| bytes, or more if IANA permits) of what CPU | ||||
| architecture the value denotes. A reference to | ||||
| the explanation can be substituted. | ||||
| </li> | ||||
| <li> | ||||
| The point of contact, including an email address. The point of | ||||
| contact can consume up to 256 bytes (or more if IANA permits). | ||||
| </li> | ||||
| </ol> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Initial Registry</name> | ||||
| <t> | ||||
| There is no initial registry. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Updating Registrations</name> | ||||
| <t> | ||||
| The registrant is free to update the assignment, i.e., change the | ||||
| explanation and/or point of contact fields. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| <!--[auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| </middle> | ||||
| <!-- $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <back> | ||||
| <!-- $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <references> | ||||
| <name>References</name> | ||||
| <references> | ||||
| <name>Normative References</name> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4506.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5531.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2203.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4121.xml"/> | ||||
| <reference anchor="hardlink" target="https://www.opengroup.org"> | ||||
| <front> | ||||
| <title abbrev="Open Group">Section 3.191 of Chapter 3 of | ||||
| Base Definitions of The Open Group Base Specifications Issue 6 | ||||
| IEEE Std 1003.1, 2004 Edition, HTML Version </title> | ||||
| <seriesInfo name="ISBN" value="1931624232"/> | ||||
| <author> | ||||
| <organization>The Open Group </organization> | ||||
| </author> | ||||
| <date year="2004"/> | ||||
| </front> | ||||
| </reference> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2743.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5040.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5403.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5662.xml"/> | ||||
| <reference anchor="symlink" target="https://www.opengroup.org"> | ||||
| <front> | ||||
| <title>Section 3.372 of Chapter 3 of | ||||
| Base Definitions of The Open Group Base Specifications Issue 6 | ||||
| IEEE Std 1003.1, 2004 Edition, HTML Version </title> | ||||
| <seriesInfo name="ISBN" value="1931624232"/> | ||||
| <author> | ||||
| <organization>The Open Group </organization> | ||||
| </author> | ||||
| <date year="2004"/> | ||||
| </front> | ||||
| </reference> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5665.xml"/> | ||||
| <reference anchor="read_atime" target="https://www.opengroup.org"> | ||||
| <front> | ||||
| <title>Section 'read()' of | ||||
| System Interfaces of The Open Group Base Specifications Issue 6 | ||||
| IEEE Std 1003.1, 2004 Edition, HTML Version </title> | ||||
| <seriesInfo name="ISBN" value="1931624232"/> | ||||
| <author> | ||||
| <organization>The Open Group </organization> | ||||
| </author> | ||||
| <date year="2004"/> | ||||
| </front> | ||||
| </reference> | ||||
| <reference anchor="readdir_atime" target="https://www.opengroup.org"> | ||||
| <front> | ||||
| <title>Section 'readdir()' of | ||||
| System Interfaces of The Open Group Base Specifications Issue 6 | ||||
| IEEE Std 1003.1, 2004 Edition, HTML Version </title> | ||||
| <seriesInfo name="ISBN" value="1931624232"/> | ||||
| <author> | ||||
| <organization>The Open Group </organization> | ||||
| </author> | ||||
| <date year="2004"/> | ||||
| </front> | ||||
| </reference> | ||||
| <reference anchor="write_atime" target="https://www.opengroup.org"> | ||||
| <front> | ||||
| <title>Section 'write()' of | ||||
| System Interfaces of The Open Group Base Specifications Issue 6 | ||||
| IEEE Std 1003.1, 2004 Edition, HTML Version </title> | ||||
| <seriesInfo name="ISBN" value="1931624232"/> | ||||
| <author> | ||||
| <organization>The Open Group </organization> | ||||
| </author> | ||||
| <date year="2004"/> | ||||
| </front> | ||||
| </reference> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3454.xml"/> | ||||
| <reference anchor="chmod" target="https://www.opengroup.org"> | ||||
| <front> | ||||
| <title>Section 'chmod()' of | ||||
| System Interfaces of The Open Group Base Specifications Issue 6 | ||||
| IEEE Std 1003.1, 2004 Edition, HTML Version </title> | ||||
| <seriesInfo name="ISBN" value="1931624232"/> | ||||
| <author> | ||||
| <organization>The Open Group </organization> | ||||
| </author> | ||||
| <date year="2004"/> | ||||
| </front> | ||||
| </reference> | ||||
| <reference anchor="ISO.10646-1.1993"> | ||||
| <front> | ||||
| <title>Information Technology - | ||||
| Universal Multiple-octet coded Character Set (UCS) - | ||||
| Part 1: Architecture and Basic Multilingual Plane </title> | ||||
| <seriesInfo name="ISO" value="Standard 10646-1"/> | ||||
| <author> | ||||
| <organization>International Organization for Standardization | ||||
| </organization> | ||||
| </author> | ||||
| <date month="May" year="1993"/> | ||||
| </front> | ||||
| </reference> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2277.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3491.xml"/> | ||||
| <reference anchor="fcntl" target="https://www.opengroup.org"> | ||||
| <front> | ||||
| <title>Section 'fcntl()' of | ||||
| System Interfaces of The Open Group Base Specifications Issue 6 | ||||
| IEEE Std 1003.1, 2004 Edition, HTML Version </title> | ||||
| <seriesInfo name="ISBN" value="1931624232"/> | ||||
| <author> | ||||
| <organization>The Open Group </organization> | ||||
| </author> | ||||
| <date year="2004"/> | ||||
| </front> | ||||
| </reference> | ||||
| <reference anchor="fsync" target="https://www.opengroup.org"> | ||||
| <front> | ||||
| <title>Section 'fsync()' of | ||||
| System Interfaces of The Open Group Base Specifications Issue 6 | ||||
| IEEE Std 1003.1, 2004 Edition, HTML Version </title> | ||||
| <seriesInfo name="ISBN" value="1931624232"/> | ||||
| <author> | ||||
| <organization>The Open Group </organization> | ||||
| </author> | ||||
| <date year="2004"/> | ||||
| </front> | ||||
| </reference> | ||||
| <reference anchor="passwd" target="https://www.opengroup.org"> | ||||
| <front> | ||||
| <title>Section 'getpwnam()' of | ||||
| System Interfaces of The Open Group Base Specifications Issue 6 | ||||
| IEEE Std 1003.1, 2004 Edition, HTML Version </title> | ||||
| <seriesInfo name="ISBN" value="1931624232"/> | ||||
| <author> | ||||
| <organization>The Open Group </organization> | ||||
| </author> | ||||
| <date year="2004"/> | ||||
| </front> | ||||
| </reference> | ||||
| <reference anchor="unlink" target="https://www.opengroup.org"> | ||||
| <front> | ||||
| <title>Section 'unlink()' of | ||||
| System Interfaces of The Open Group Base Specifications Issue 6 | ||||
| IEEE Std 1003.1, 2004 Edition, HTML Version </title> | ||||
| <seriesInfo name="ISBN" value="1931624232"/> | ||||
| <author> | ||||
| <organization>The Open Group </organization> | ||||
| </author> | ||||
| <date year="2004"/> | ||||
| </front> | ||||
| </reference> | ||||
| <!-- [auth] obsoleted by RFC 5531 | ||||
| <reference anchor='RFC1831'> | ||||
| <front> | ||||
| <title abbrev='Remote Procedure Call Protocol Version 2'>RPC: | ||||
| Remote Procedure Call Protocol Specification Version 2</title> | ||||
| <author initials='R.' surname='Srinivasan' fullname='Raj Srinivasan'> | ||||
| <organization>Sun Microsystems, Inc., ONC Technologies</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street>2550 Garcia Avenue</street> | ||||
| <street>M/S MTV-5-40</street> | ||||
| <city>Mountain View</city> | ||||
| <region>CA</region> | ||||
| <code>94043</code> | ||||
| <country>US</country></postal> | ||||
| <phone>+1 415 336 2478</phone> | ||||
| <facsimile>+1 415 336 6015</facsimile> | ||||
| <email>raj@eng.sun.com</email></address></author> | ||||
| <date year='1995' month='August' /> | ||||
| <abstract> | ||||
| <t>This document describes the ONC Remote Procedure Call (ONC | ||||
| RPC Version 2) protocol as it is currently deployed and | ||||
| accepted. "ONC" stands for "Open Network | ||||
| Computing".</t></abstract></front> | ||||
| <seriesInfo name='RFC' value='1831' /> | ||||
| <format type='TXT' octets='37798' target='ftp://ftp.isi.edu/in-notes/rfc1831.txt' /> | ||||
| </reference> --> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4055.xml"/> | ||||
| <reference anchor="CSOR_AES" target="http://csrc.nist.gov/groups/ST/crypto_apps_infra/csor/algorithms.html"> | ||||
| <front> | ||||
| <title>Cryptographic Algorithm Object Registration | ||||
| </title> | ||||
| <author> | ||||
| <organization>National Institute of Standards and Technology | ||||
| </organization> | ||||
| </author> | ||||
| <date month="November" year="2007"/> | ||||
| </front> | ||||
| </reference> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7861.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4120.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4033.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7858.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8000.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8166.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8267.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8484.xml"/> | ||||
| <!-- Add this ref if we can add a reference to BCP 9 (mentioned in the IC section): | ||||
| <referencegroup anchor="BCP09" target="https://www.rfc-editor.org/info/bcp9"> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2026.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7127.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5657.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6410.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7100.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7475.xml"/> | ||||
| </referencegroup> | ||||
| --> | ||||
| </references> | ||||
| <references> | ||||
| <name>Informative References</name> | ||||
| <!--draft-roach-bis-documents expired --> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.draft-roach-bis-documents-00.xml"/> | ||||
| <!-- RFC 3530 (NFSv4 version 0) is obsoleted by RFC 7530, but is | ||||
| mentioned in historical context. | ||||
| --> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3530.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.1813.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2847.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2623.xml"/> | ||||
| <reference anchor="Chet"> | ||||
| <front> | ||||
| <title>Improving the Performance | ||||
| and Correctness of an NFS Server</title> | ||||
| <author initials="C." surname="Juszczak" fullname="Chet Juszczak"> | ||||
| <organization>Digital Equipment Corporation</organization> | ||||
| </author> | ||||
| <date month="June" year="1990"/> | ||||
| <abstract> | ||||
| <t> | ||||
| Describes reply cache implementation that | ||||
| avoids work in the server by handling | ||||
| duplicate requests. More important, though | ||||
| listed as a side-effect, the reply cache | ||||
| aids in the avoidance of destructive non- | ||||
| idempotent operation re-application -- | ||||
| improving correctness. | ||||
| </t> | ||||
| </abstract> | ||||
| </front> | ||||
| <refcontent>USENIX Conference Proceedings</refcontent> | ||||
| </reference> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3232.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.1833.xml"/> | ||||
| <reference anchor="rpc_xid_issues"> | ||||
| <front> | ||||
| <title>RPC XID Issues</title> | ||||
| <author initials="R." surname="Werme" fullname="Ric Werme"> | ||||
| <organization>Digital Equipment Corporation</organization> | ||||
| </author> | ||||
| <date month="February" year="1996"/> | ||||
| <abstract> | ||||
| <t> | ||||
| The presentation provides implementation advice for | ||||
| ONC RPC transaction identifier (xid) generation. | ||||
| </t> | ||||
| </abstract> | ||||
| </front> | ||||
| <refcontent>USENIX Conference Proceedings</refcontent> | ||||
| </reference> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.1094.xml"/> | ||||
| <!-- Found the following | ||||
| http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.7106&rep=rep1&type=pdf | ||||
| --> | ||||
| <reference anchor="ha_nfs_ibm"> | ||||
| <front> | ||||
| <title>A Highly Available Network Server</title> | ||||
| <author initials="A." surname="Bhide" fullname="Anupam Bhide"> | ||||
| <organization>IBM T.J. Watson Research Center</organization> | ||||
| </author> | ||||
| <author initials="E. N." surname="Elnozahy" fullname="Elmootazbellah N. Elnozahy"> | ||||
| <organization>IBM T.J. Watson Research Center</organization> | ||||
| </author> | ||||
| <author initials="S. P." surname="Morgan" fullname="Stephen P. Morgan "> | ||||
| <organization>IBM T.J. Watson Research Center</organization> | ||||
| </author> | ||||
| <date month="January" year="1991"/> | ||||
| <abstract> | ||||
| <t> | ||||
| This paper presents the design and implementation | ||||
| of a Highly Available Network File Server | ||||
| (HA-NFS). We separate the problem of network | ||||
| file server reliability into three different subproblems: | ||||
| server reliability, disk reliability, and network | ||||
| reliability. HA-NFS offers a different solution | ||||
| for each: dual-ported disks and impersonation | ||||
| are used to provide server reliability, disk mirroring | ||||
| can be used to provide disk reliability, and optional | ||||
| network replication can be used to provide | ||||
| network reliability. The implementation shows | ||||
| that HA-NFS provides high availability without | ||||
| the excessive resource overhead or the performance | ||||
| degradation that characterize traditional replication | ||||
| methods. Ongoing operations are not aborted | ||||
| during fail-over and recovery is completely transparent | ||||
| to applications. HA-NFS adheres to the | ||||
| NFS protocol standard and can be used by existing | ||||
| NFS clients without modification. | ||||
| </t> | ||||
| </abstract> | ||||
| </front> | ||||
| <refcontent>USENIX Conference Proceedings</refcontent> | ||||
| </reference> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5664.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5663.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2054.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2055.xml"/> | ||||
| <reference anchor="errata" target="https://www.ietf.org/about/groups/iesg/statements/processing-rfc-errata/"> | ||||
| <front> | ||||
| <title>IESG Processing of RFC Errata for the IETF Stream | ||||
| </title> | ||||
| <author> | ||||
| <organization>IESG | ||||
| </organization> | ||||
| </author> | ||||
| <date month="July" year="2008"/> | ||||
| </front> | ||||
| </reference> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2104.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2624.xml"/> | ||||
| <reference anchor="xnfs"> | ||||
| <front> | ||||
| <title> Protocols for Interworking: XNFS, Version 3W</title> | ||||
| <seriesInfo name="ISBN" value="1-85912-184-5"/> | ||||
| <author> | ||||
| <organization>The Open Group </organization> | ||||
| </author> | ||||
| <date month="February" year="1998"/> | ||||
| </front> | ||||
| </reference> | ||||
| <reference anchor="Floyd"> | ||||
| <front> | ||||
| <title> The Synchronization of Periodic Routing Messages </title> | ||||
| <author initials="S." surname="Floyd"> | ||||
| <organization/> | ||||
| </author> | ||||
| <author initials="V." surname="Jacobson"> | ||||
| <organization/> | ||||
| </author> | ||||
| <date month="April" year="1994"/> | ||||
| </front> | ||||
| <refcontent>IEEE/ACM Transactions on Networking, 2(2), pp. 122-136</refcontent> | ||||
| </reference> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3720.xml"/> | ||||
| <reference anchor="FCP-2"> | ||||
| <front> | ||||
| <title>Fibre Channel Protocol for SCSI, 2nd Version (FCP-2)</title> | ||||
| <author initials="R." surname="Snively" fullname="Robert Snively"> | ||||
| <organization>Brocade Communication Systems, Inc.</organization> | ||||
| </author> | ||||
| <date month="Oct" year="2003"/> | ||||
| </front> | ||||
| <refcontent>ANSI/INCITS, 350-2003</refcontent> | ||||
| </reference> | ||||
| <!-- [rfced] The URL http://www.t10.org/ftp/t10/drafts/osd/osd-r10.pdf | ||||
| does not work. Should the URL be removed or updated? | ||||
| Original: | ||||
| [57] Weber, R., "Object-Based Storage Device Commands (OSD)", | ||||
| ANSI/INCITS 400-2004, July 2004, | ||||
| <http://www.t10.org/ftp/t10/drafts/osd/osd-r10.pdf>. | ||||
| --> | ||||
| <reference anchor="OSD-T10" target="http://www.t10.org/ftp/t10/drafts/osd/osd-r10.pdf"> | ||||
| <front> | ||||
| <title>Object-Based Storage Device Commands (OSD)</title> | ||||
| <author initials="R.O." surname="Weber" fullname="Ralph O. Weber"> | ||||
| <organization>ENDL Texas</organization> | ||||
| </author> | ||||
| <date month="July" year="2004"/> | ||||
| </front> | ||||
| <refcontent>ANSI/INCITS, 400-2004</refcontent> | ||||
| </reference> | ||||
| <reference anchor="PVFS"> | ||||
| <front> | ||||
| <title>PVFS: A Parallel File System for Linux Clusters.</title> | ||||
| <author initials="P. H." surname="Carns"> | ||||
| <organization> Parallel Architecture Research Laboratory, | ||||
| Clemson University, Clemson, SC 29634 </organization> | ||||
| </author> | ||||
| <author initials="W. B." surname="Ligon III"> | ||||
| <organization> Parallel Architecture Research Laboratory, | ||||
| Clemson University, Clemson, SC 29634 </organization> | ||||
| </author> | ||||
| <author initials="R. B." surname="Ross"> | ||||
| <organization> Parallel Architecture Research Laboratory, | ||||
| Clemson University, Clemson, SC 29634 </organization> | ||||
| </author> | ||||
| <author initials="R." surname="Thakur"> | ||||
| <organization>Mathematics and Computer Science Division, | ||||
| Argonne National Laboratory, Argonne, IL 60439</organization> | ||||
| </author> | ||||
| <date year="2000"/> | ||||
| </front> | ||||
| <refcontent>Proceedings of the 4th Annual Linux Showcase and Conference</refcontent> | ||||
| </reference> | ||||
| <reference anchor="access_api" target="https://www.opengroup.org"> | ||||
| <front> | ||||
| <title>The Open Group Base Specifications Issue 6, IEEE Std 1003.1, 2004 Edition | ||||
| </title> | ||||
| <author> | ||||
| <organization>The Open Group | ||||
| </organization> | ||||
| </author> | ||||
| <date year="2004"/> | ||||
| <abstract> | ||||
| <t> | ||||
| The description of the access() function states: "If the process has appropriate privileges, an implementation may indicate success for X_OK even if none of the execute file permission bits are set." | ||||
| </t> | ||||
| </abstract> | ||||
| </front> | ||||
| </reference> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2224.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2755.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8126.xml"/> | ||||
| <reference anchor="Err2006" quote-title="false" target="https://www.rfc-editor.org/errata/eid2006"> | ||||
| <front> | ||||
| <title>Erratum ID 2006</title> | ||||
| <author> | ||||
| <organization>RFC Errata</organization> | ||||
| </author> | ||||
| </front> | ||||
| <refcontent>RFC 5661</refcontent> | ||||
| </reference> | ||||
| <!-- [rfced] This URL appears to refer to a personal site. Is there a | ||||
| stable URL to which we can refer? | ||||
| Original: | ||||
| [64] Spasojevic, M. and M. Satayanarayanan, "An Empirical Study | ||||
| of a Wide-Area Distributed File System", May 1996, | ||||
| <https://www.cs.cmu.edu/~satya/docdir/spasojevic-tocs-afs- | ||||
| measurement-1996.pdf>. | ||||
| --> | ||||
| <reference anchor="AFS" target="https://www.cs.cmu.edu/~satya/docdir/spasojevic-tocs-afs-measurement-1996.pdf"> | ||||
| <front> | ||||
| <title> | ||||
| An Empirical Study of a Wide-Area Distributed File System | ||||
| </title> | ||||
| <author initials="M." surname="Spasojevic" fullname="Mirjana Spasojevic"> | ||||
| </author> | ||||
| <author initials="M." surname="Satayanarayanan" fullname="Mahadev Satayanarayanan"> | ||||
| </author> | ||||
| <date year="1996" month="May"/> | ||||
| </front> | ||||
| </reference> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5661.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8178.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7530.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7931.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8434.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7258.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3552.xml"/> | ||||
| </references> | ||||
| </references> | ||||
| <!-- [auth] $Id: 2009-12-20-TO-rfc5661.xml,v 1.2 2009/12/21 05:59:32 shepler.mre Exp $ --> | ||||
| <section anchor="NEED" numbered="true" toc="default"> | ||||
| <name>The Need for This Update</name> | ||||
| <t> | ||||
| This document includes an explanation of how clients and servers | ||||
| are to determine the particular network access paths to be used to access a | ||||
| file system. This includes descriptions of | ||||
| how to handle changes to the specific replica to be used or to | ||||
| the set of addresses to be used to access it, | ||||
| and how to deal transparently with transfers of responsibility that need to be | ||||
| made. This includes cases in which | ||||
| there is a shift between one replica and another and those in | ||||
| which different network access paths are used to access the | ||||
| same replica. | ||||
| </t> | ||||
| <t> | ||||
| As a result of the following problems in RFC 5661 | ||||
| <xref target="RFC5661" format="default"/>, it | ||||
| was necessary to provide the specific updates that are made by this | ||||
| document. These updates are described in <xref target="CHG" format="default"/>. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| RFC 5661 <xref target="RFC5661" format="default"/>, while it dealt with situations in | ||||
| which various forms of clustering allowed coordination | ||||
| of the state assigned by cooperating servers to be used, | ||||
| made no provisions for Transparent State Migration. Within NFSv4.0, | ||||
| Transparent State Migration was first explained clearly in | ||||
| RFC 7530 <xref target="RFC7530" format="default"/> and corrected and | ||||
| clarified by RFC 7931 <xref target="RFC7931" format="default"/>. No corresponding | ||||
| explanation for NFSv4.1 had been provided. | ||||
| </li> | ||||
| <li> | ||||
| Although NFSv4.1 provided a clear definition of how | ||||
| trunking detection was to be done, there was no clear specification | ||||
| of how trunking discovery was to be done, despite the fact that | ||||
| the specification clearly indicated that this information | ||||
| could be made available via the file system location attributes. | ||||
| </li> | ||||
| <li> | ||||
| Because the existence of multiple network access paths to the same | ||||
| file system was dealt with as if there were multiple replicas, issues relating to | ||||
| transitions between replicas could never be clearly distinguished | ||||
| from trunking-related transitions between the addresses used to | ||||
| access a particular file system instance. As a result, in situations in | ||||
| which both migration and trunking configuration changes | ||||
| were involved, neither of these could be clearly dealt with, and the relationship between | ||||
| these two features was not seriously addressed. | ||||
| </li> | ||||
| <li> | ||||
| Because use of two network access paths to the same file system | ||||
| instance (i.e., trunking) was often treated as if two replicas were | ||||
| involved, it was considered that two replicas were being used simultaneously. | ||||
| As a result, the treatment of replicas being used simultaneously | ||||
| in RFC 5661 <xref target="RFC5661" format="default"/> was not clear, as it covered the | ||||
| two distinct cases of a single file system instance being accessed by | ||||
| two different network access paths and two | ||||
| replicas being accessed simultaneously, with the limitations | ||||
| of the latter case not being clearly laid out. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The majority of the consequences of these issues are dealt with | ||||
| by presenting in <xref target="NEW11" format="default"/> a replacement | ||||
| for Section <xref target="RFC5661" sectionFormat="bare" section="11"/> | ||||
| of RFC 5661 <xref target="RFC5661"/>. This replacement | ||||
| modifies existing subsections within that section and adds new | ||||
| ones as described in <xref target="CHG-11" format="default"/>. Also, some existing | ||||
| sections were deleted. These changes were made in order to do the | ||||
| following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Reorganize the description so that the case of two network access paths to | ||||
| the same file system instance is distinguished clearly from the case of | ||||
| two different replicas since, in the former case, locking state is shared and there also | ||||
| can be sharing of session state. | ||||
| </li> | ||||
| <li> | ||||
| Provide a clear statement regarding the desirability of | ||||
| transparent transfer of state between replicas together with a recommendation | ||||
| that either transparent transfer or a single-fs grace period be provided. | ||||
| </li> | ||||
| <li> | ||||
| Specifically delineate how a client is to handle such transfers, | ||||
| taking into account the differences from the treatment | ||||
| in <xref target="RFC7931" format="default"/> made necessary by the major protocol | ||||
| changes to NFSv4.1. | ||||
| </li> | ||||
| <li> | ||||
| Discuss the relationship between transparent | ||||
| state transfer and Parallel NFS (pNFS). | ||||
| </li> | ||||
| <li> | ||||
| Clarify the fs_locations_info attribute in order to specify | ||||
| which portions of the provided information apply to a specific | ||||
| network access path and which apply to the replica that the path | ||||
| is used to access. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| In addition, other sections of RFC 5661 <xref target="RFC5661" format="default"/> | ||||
| were updated to correct the consequences of the | ||||
| incorrect assumptions underlying the treatment of multi-server namespace | ||||
| issues. These are described in Appendices <xref target="CHG-ops" format="counter"/> through | ||||
| <xref target="CHG-other" format="counter"/>. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| A revised introductory section regarding multi-server namespace | ||||
| facilities is provided. | ||||
| </li> | ||||
| <li> | ||||
| A more realistic treatment of server scope is provided. This treatment | ||||
| reflects the more limited coordination of locking state | ||||
| adopted by servers actually sharing a common server scope. | ||||
| </li> | ||||
| <li> | ||||
| Some confusing text regarding changes in server_owner has | ||||
| been clarified. | ||||
| </li> | ||||
| <li> | ||||
| The description of some existing errors has been modified | ||||
| to more clearly explain certain error situations to reflect | ||||
| the existence of trunking and the possible use of fs-specific grace | ||||
| periods. For details, see <xref target="CHG-errs" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| New descriptions of certain existing operations are | ||||
| provided, either because the existing treatment did not | ||||
| account for situations that would arise in dealing with | ||||
| Transparent State Migration, or because some types of reclaim | ||||
| issues were not adequately dealt with in the context of fs-specific | ||||
| grace periods. For details, see <xref target="CHG-ops" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="CHG" numbered="true" toc="default"> | ||||
| <name>Changes in This Update</name> | ||||
| <section anchor="CHG-11" numbered="true" toc="default"> | ||||
| <name>Revisions Made to Section 11 of RFC 5661</name> | ||||
| <t> | ||||
| A number of areas have been revised or extended, in many cases | ||||
| replacing subsections within Section | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11"/> of RFC 5661 <xref target="RFC5661"/>: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| New introductory material, including a terminology section, | ||||
| replaces the material in RFC 5661 <xref target="RFC5661" format="default"/>, | ||||
| ranging from the start of the original Section | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11"/> up to and including | ||||
| Section <xref target="RFC5661" sectionFormat="bare" section="11.1"/>. | ||||
| The new material starts at the beginning of | ||||
| <xref target="NEW11" format="default"/> and continues | ||||
| through <xref target="SEC11-loc-attr" format="counter"/>. | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| A significant reorganization of the material in Sections | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.4"/> and | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.5"/> of RFC 5661 | ||||
| <xref target="RFC5661"/> was necessary. The reasons for the reorganization of | ||||
| these sections into a single section with multiple subsections | ||||
| are discussed in <xref target="SEC11-uses-reorg" format="default"/> below. | ||||
| This replacement appears as <xref target="SEC11-USES" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| New material relating to the handling of the file system location | ||||
| attributes is contained in Sections <xref target="SEC11-USES-mult" format="counter"/> and | ||||
| <xref target="SEC11-USES-changes" format="counter"/>. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| A new section describing requirements for user and group | ||||
| handling within a multi-server namespace has been added as | ||||
| <xref target="SEC11-users" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| A major replacement for Section | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.7"/> of RFC 5661 <xref target="RFC5661"/>, | ||||
| entitled "Effecting File System Transitions", appears as Sections | ||||
| <xref target="SEC11-trans-oview" format="counter"/> through | ||||
| <xref target="SEC11-trans-server" format="counter"/>. | ||||
| The reasons for the reorganization of | ||||
| this section into multiple sections are discussed in | ||||
| <xref target="SEC11-trans-reorg" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| A replacement for Section | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.10"/> of RFC 5661 <xref target="RFC5661"/>, | ||||
| entitled "The Attribute fs_locations_info", appears as | ||||
| <xref target="SEC11-li-new" format="default"/>, with | ||||
| <xref target="SEC11-li-changes" format="default"/> describing the differences | ||||
| between the new section and the treatment within | ||||
| <xref target="RFC5661" format="default"/>. | ||||
| A revised treatment was necessary because the original treatment | ||||
| did not make clear how the added attribute information relates | ||||
| to the case of trunked paths to the same replica. These issues | ||||
| were not addressed in RFC 5661 <xref target="RFC5661" format="default"/> where the | ||||
| concepts of a replica and a network path used to access a replica | ||||
| were not clearly distinguished. | ||||
| </li> | ||||
| </ul> | ||||
| <section anchor="SEC11-uses-reorg" toc="exclude" numbered="true"> | ||||
| <name>Reorganization of Sections 11.4 and 11.5 of RFC 5661</name> | ||||
| <t> | ||||
| Previously, issues related to the fact that multiple location | ||||
| entries directed the client to the same file system instance | ||||
| were dealt with in Section <xref target="RFC5661" sectionFormat="bare" section="11.5"/> of RFC 5661 <xref target="RFC5661"/>. | ||||
| Because of the new treatment of trunking, these issues now belong | ||||
| within <xref target="SEC11-USES" format="default"/>. | ||||
| </t> | ||||
| <t> | ||||
| In this new section, trunking is covered in | ||||
| <xref target="SEC11-USES-trunk" format="default"/> together with the other uses | ||||
| of file system location information described in Sections | ||||
| <xref target="SEC11-USES-types" format="counter"/> through | ||||
| <xref target="SEC11-USES-ref" format="counter"/>. | ||||
| </t> | ||||
| <t> | ||||
| As a result, <xref target="SEC11-USES" format="default"/>, which replaces | ||||
| Section <xref target="RFC5661" sectionFormat="bare" section="11.4"/> | ||||
| of RFC 5661 <xref target="RFC5661"/>, is substantially | ||||
| different than the section it replaces in that some original | ||||
| sections have been replaced by corresponding sections as described below, while | ||||
| new sections have been added: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The material in <xref target="SEC11-USES" format="default"/>, | ||||
| exclusive of subsections, replaces the material | ||||
| in Section <xref target="RFC5661" sectionFormat="bare" section="11.4"/> of RFC 5661 <xref target="RFC5661"/> exclusive of | ||||
| subsections. | ||||
| </li> | ||||
| <li> | ||||
| <xref target="SEC11-USES-mult" format="default"/> | ||||
| is the new first subsection of the overall section. | ||||
| </li> | ||||
| <li> | ||||
| <xref target="SEC11-USES-trunk" format="default"/> | ||||
| is the new second subsection of the overall section. | ||||
| </li> | ||||
| <li> | ||||
| Each of the Sections | ||||
| <xref target="SEC11-USES-repl" format="counter"/>, | ||||
| <xref target="SEC11-USES-migr" format="counter"/>, and | ||||
| <xref target="SEC11-USES-ref" format="counter"/> | ||||
| replaces (in order) one of the corresponding Sections | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.4.1"/>, | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.4.2"/>, and | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.4.3"/> of RFC 5661 | ||||
| <xref target="RFC5661"/>. | ||||
| </li> | ||||
| <li> | ||||
| <xref target="SEC11-USES-changes" format="default"/> | ||||
| is the new final subsection of the overall section. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="SEC11-trans-reorg" toc="exclude" numbered="true"> | ||||
| <name>Reorganization of Material Dealing with File System Transitions</name> | ||||
| <t> | ||||
| The material relating to file system transition, previously contained | ||||
| in Section <xref target="RFC5661" sectionFormat="bare" section="11.7"/> of RFC 5661 <xref target="RFC5661"/> has | ||||
| been reorganized and augmented as described below: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| Because there can be a shift of the network access paths used to | ||||
| access a file system instance without any shift between replicas, | ||||
| a new <xref target="SEC11-trans-oview" format="default"/> distinguishes | ||||
| between those cases in which there is a shift between | ||||
| distinct replicas and those involving a shift in network | ||||
| access paths with no shift between replicas. | ||||
| </t> | ||||
| <t> | ||||
| As a result, the new <xref target="SEC11-nwa" format="default"/> deals with network | ||||
| address transitions, while the bulk of the original Section | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.7"/> of RFC | ||||
| 5661 <xref target="RFC5661"/> has been extensively modified as reflected in | ||||
| <xref target="SEC11-EFF" format="default"/>, which is now limited to cases | ||||
| in which there is a shift between two different sets of replicas. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| The additional <xref target="SEC11-trans-locking" format="default"/> discusses the | ||||
| case in which a shift to a different replica is made and state | ||||
| is transferred to allow the client the ability to have continued | ||||
| access to its accumulated locking state on the new server. | ||||
| </li> | ||||
| <li> | ||||
| The additional <xref target="SEC11-trans-client" format="default"/> discusses | ||||
| the client's response to access transitions, how it determines | ||||
| whether migration has occurred, and how it gets access to any | ||||
| transferred locking and session state. | ||||
| </li> | ||||
| <li> | ||||
| The additional <xref target="SEC11-trans-server" format="default"/> discusses the | ||||
| responsibilities of the source and destination servers when | ||||
| transferring locking and session state. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| This reorganization has caused a renumbering of the sections | ||||
| within <xref target="RFC5661" sectionFormat="of" section="11"/> as described below: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The new Sections <xref target="SEC11-trans-oview" format="counter"/> | ||||
| and <xref target="SEC11-nwa" format="counter"/> have resulted | ||||
| in the renumbering of existing sections with these numbers. | ||||
| </li> | ||||
| <li> | ||||
| <xref target="RFC5661" sectionFormat="of" section="11.7"/> has been substantially | ||||
| modified and appears as <xref target="SEC11-EFF" format="default"/>. The necessary | ||||
| modifications reflect the fact that this section only deals | ||||
| with transitions between replicas, while transitions between | ||||
| network addresses are dealt with in other sections. Details | ||||
| of the reorganization are described later in this section. | ||||
| </li> | ||||
| <li> | ||||
| Sections | ||||
| <xref target="SEC11-trans-locking" format="counter"/>, | ||||
| <xref target="SEC11-trans-client" format="counter"/>, and | ||||
| <xref target="SEC11-trans-server" format="counter"/> have been | ||||
| added. | ||||
| </li> | ||||
| <li> | ||||
| Consequently, Sections <xref target="RFC5661" sectionFormat="bare" section="11.8"/>, | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.9"/>, | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.10"/>, and | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.11"/> in | ||||
| <xref target="RFC5661" format="default"/> now appear | ||||
| as Sections <xref target="effecting_referrals" format="counter"/>, | ||||
| <xref target="fs_locations" format="counter"/>, | ||||
| <xref target="SEC11-li-new" format="counter"/>, and | ||||
| <xref target="fs_status" format="counter"/>, | ||||
| respectively. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| As part of this general reorganization, | ||||
| Section <xref target="RFC5661" sectionFormat="bare" section="11.7"/> of RFC 5661 <xref target="RFC5661"/> | ||||
| has been modified as described below: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Sections <xref target="RFC5661" sectionFormat="bare" section="11.7"/> and | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.7.1"/> of RFC 5661 <xref target="RFC5661" format="default"/> | ||||
| have been replaced by Sections | ||||
| <xref target="SEC11-EFF" format="counter"/> and | ||||
| <xref target="SEC11-EFF-simul" format="counter"/>, respectively. | ||||
| </li> | ||||
| <li> | ||||
| Section <xref target="RFC5661" sectionFormat="bare" section="11.7.2"/> | ||||
| of RFC 5661 (and included subsections) has been deleted. | ||||
| </li> | ||||
| <li> | ||||
| Sections <xref target="RFC5661" sectionFormat="bare" section="11.7.3"/>, | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.7.4"/>, | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.7.5"/>, | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.7.5.1"/>, and | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.7.6"/> of RFC 5661 | ||||
| <xref target="RFC5661" format="default"/> have been replaced by Sections | ||||
| <xref target="SEC11-EFF-fh" format="counter"/>, | ||||
| <xref target="SEC11-EFF-fileid" format="counter"/>, | ||||
| <xref target="SEC11-EFF-fsid" format="counter"/>, | ||||
| <xref target="SEC11-EFF-fsid-split" format="counter"/>, and | ||||
| <xref target="SEC11-EFF-change" format="counter"/> | ||||
| respectively in this document. | ||||
| </li> | ||||
| <li> | ||||
| Section <xref target="RFC5661" sectionFormat="bare" section="11.7.7"/> | ||||
| of RFC 5661 <xref target="RFC5661"/> has been replaced by | ||||
| <xref target="SEC11-EFF-lock" format="default"/>. This subsection has been | ||||
| moved to the end of the section dealing with file system transitions. | ||||
| </li> | ||||
| <li> | ||||
| Sections <xref target="RFC5661" sectionFormat="bare" section="11.7.8"/>, | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.7.9"/>, and | ||||
| <xref target="RFC5661" sectionFormat="bare" section="11.7.10"/> of RFC 5661 | ||||
| <xref target="RFC5661" format="default"/> have been replaced by Sections | ||||
| <xref target="SEC11-EFF-wv" format="counter"/>, | ||||
| <xref target="SEC11-EFF-rdc" format="counter"/>, and | ||||
| <xref target="SEC11-EFF-data" format="counter"/> | ||||
| respectively in this document. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="SEC11-li-changes" toc="exclude" numbered="true"> | ||||
| <name>Updates to the Treatment of fs_locations_info</name> | ||||
| <t> | ||||
| Various elements of the fs_locations_info attribute contain | ||||
| information that applies to either a specific file system replica | ||||
| or to a network path or set of network paths used to access such a replica. | ||||
| The original treatment of fs_locations_info (Section <xref target="RFC5661" sectionFormat="bare" section="11.10"/> of RFC 5661 <xref target="RFC5661"/>) | ||||
| did not clearly distinguish these cases, in | ||||
| part because the document did not clearly distinguish replicas from | ||||
| the paths used to access them. | ||||
| </t> | ||||
| <t> | ||||
| In addition, special clarification has been provided with regard | ||||
| to the following fields: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| With regard to the handling of FSLI4GF_GOING, it was | ||||
| clarified that this only applies to the unavailability of a | ||||
| replica rather than to a path to access a replica. | ||||
| </li> | ||||
| <li> | ||||
| In describing the appropriate value for a server to use for | ||||
| fli_valid_for, it was clarified that there is no | ||||
| need for the client to frequently fetch the fs_locations_info | ||||
| value to be prepared for shifts in trunking patterns. | ||||
| </li> | ||||
| <li> | ||||
| Clarification of the rules for extensions to the fls_info has | ||||
| been provided. The original treatment reflected the extension | ||||
| model that was in effect at the time RFC 5661 <xref target="RFC5661" format="default"/> | ||||
| was written, but has been updated in accordance with the extension model | ||||
| described in RFC 8178 <xref target="RFC8178" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="CHG-ops" numbered="true" toc="default"> | ||||
| <name>Revisions Made to Operations in RFC 5661</name> | ||||
| <t> | ||||
| Descriptions have been revised to address issues that arose in | ||||
| effecting necessary changes to multi-server namespace features. | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The treatment of EXCHANGE_ID (Section <xref target="RFC5661" sectionFormat="bare" section="18.35"/> of RFC 5661 <xref target="RFC5661"/>) assumed that client IDs | ||||
| cannot be created/confirmed other than by the EXCHANGE_ID and CREATE_SESSION | ||||
| operations. Also, the necessary use of EXCHANGE_ID in recovery | ||||
| from migration and related situations was not clearly addressed. | ||||
| A revised treatment of EXCHANGE_ID was necessary, and it appears in | ||||
| <xref target="OP_EXCHANGE_ID" format="default"/>, while the specific differences | ||||
| between it and the treatment within <xref target="RFC5661" format="default"/> | ||||
| are explained in <xref target="OTH-eid" format="default"/> below. | ||||
| </li> | ||||
| <li> | ||||
| The treatment of RECLAIM_COMPLETE in Section <xref target="RFC5661" sectionFormat="bare" section="18.51"/> of RFC 5661 <xref target="RFC5661"/> was not sufficiently clear about the | ||||
| purpose and use of the rca_one_fs and how the server was to deal | ||||
| with inappropriate values of this argument. Because the | ||||
| resulting confusion raised interoperability issues, a new treatment | ||||
| of RECLAIM_COMPLETE was necessary, and it appears in | ||||
| <xref target="OP_RECLAIM_COMPLETE" format="default"/>, while the specific differences | ||||
| between it and the treatment within RFC 5661 <xref target="RFC5661" format="default"/> | ||||
| are discussed in <xref target="OTH-rc" format="default"/> below. In addition, the | ||||
| definitions of the reclaim-related errors have received an updated | ||||
| treatment in <xref target="errors_reclaim" format="default"/> to reflect the fact | ||||
| that there are multiple contexts for lock reclaim operations. | ||||
| </li> | ||||
| </ul> | ||||
| <section anchor="OTH-eid" toc="exclude" numbered="true"> | ||||
| <name>Revision of Treatment of EXCHANGE_ID</name> | ||||
| <t> | ||||
| There was a number of issues in the original treatment of | ||||
| EXCHANGE_ID in RFC 5661 <xref target="RFC5661" format="default"/> that caused problems | ||||
| for Transparent State Migration and for the transfer of access | ||||
| between different network access paths to the same file system instance. | ||||
| </t> | ||||
| <t> | ||||
| These issues arose from the fact that this treatment was written: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| Assuming that a client ID can only become known to a server | ||||
| by having been created by executing an EXCHANGE_ID, with | ||||
| confirmation of the ID only possible by execution of a | ||||
| CREATE_SESSION. | ||||
| </li> | ||||
| <li> | ||||
| Considering the interactions between a client and a server only | ||||
| occurring on a single network address. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| As these assumptions have become invalid in the context of | ||||
| Transparent State Migration and active use of trunking, | ||||
| the treatment has been modified in several respects: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| It had been assumed that an EXCHANGE_ID executed when the server | ||||
| was already aware that a given client instance was either updating | ||||
| associated parameters (e.g., with respect to callbacks) or dealing | ||||
| with a previously lost reply by retransmitting. As a | ||||
| result, any slot sequence returned by that operation | ||||
| would be of no use. The original treatment | ||||
| went so far as to say that it "<bcp14>MUST NOT</bcp14>" be used, although | ||||
| this usage was not in accord with <xref target="RFC2119" format="default"/>. | ||||
| This created a difficulty when an EXCHANGE_ID is done after Transparent State | ||||
| Migration since that slot sequence would need to be used in a | ||||
| subsequent CREATE_SESSION. | ||||
| </t> | ||||
| <t> | ||||
| In the updated treatment, CREATE_SESSION is a way that client | ||||
| IDs are confirmed, but it is understood that other ways are | ||||
| possible. The slot sequence can be used as needed, and cases | ||||
| in which it would be of no use are appropriately noted. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| It had been assumed that the only functions of EXCHANGE_ID were to | ||||
| inform the server of the client, to create the client ID, | ||||
| and to communicate it to the client. When multiple | ||||
| simultaneous connections are involved, as often happens when | ||||
| trunking, that treatment was inadequate in that it ignored the | ||||
| role of EXCHANGE_ID in associating the client ID with the | ||||
| connection on which it was done, so that it could be used | ||||
| by a subsequent CREATE_SESSSION whose parameters do not | ||||
| include an explicit client ID. | ||||
| </t> | ||||
| <t> | ||||
| The new treatment explicitly discusses the role of EXCHANGE_ID | ||||
| in associating the client ID with the connection so it | ||||
| can be used by CREATE_SESSION and in associating a connection with an | ||||
| existing session. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The new treatment can be found in <xref target="OP_EXCHANGE_ID" format="default"/> | ||||
| above. It supersedes the treatment in Section | ||||
| <xref target="RFC5661" sectionFormat="bare" section="18.35"/> of RFC 5661 <xref target="RFC5661"/>. | ||||
| </t> | ||||
| </section> | ||||
| <section anchor="OTH-rc" toc="exclude" numbered="true"> | ||||
| <name>Revision of Treatment of RECLAIM_COMPLETE</name> | ||||
| <t> | ||||
| The following changes were made to the treatment of | ||||
| RECLAIM_COMPLETE in RFC 5661 <xref target="RFC5661" format="default"/> to arrive at the | ||||
| treatment in <xref target="OP_RECLAIM_COMPLETE" format="default"/>: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| In a number of places, the text was made more explicit about the | ||||
| purpose of rca_one_fs and its connection to file system | ||||
| migration. | ||||
| </li> | ||||
| <li> | ||||
| There is a discussion of situations in which particular forms of | ||||
| RECLAIM_COMPLETE would need to be done. | ||||
| </li> | ||||
| <li> | ||||
| There is a discussion of interoperability issues between | ||||
| implementations that may have arisen due to the lack of | ||||
| clarity of the previous treatment of RECLAIM_COMPLETE. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="CHG-errs" numbered="true" toc="default"> | ||||
| <name>Revisions Made to Error Definitions in RFC 5661</name> | ||||
| <t> | ||||
| The new handling of various situations required revisions to | ||||
| some existing error definitions: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| Because of the need to appropriately address trunking-related | ||||
| issues, some uses of the term "replica" in RFC 5661 | ||||
| <xref target="RFC5661" format="default"/> | ||||
| became problematic because a shift in network access paths was | ||||
| considered to be a shift to a different replica. As a result, | ||||
| the original definition of NFS4ERR_MOVED (in Section <xref target="RFC5661" sectionFormat="bare" section="15.1.2.4"/> of RFC 5661 <xref target="RFC5661"/>) was updated to reflect the | ||||
| different handling of unavailability of a particular fs via a | ||||
| specific network address. | ||||
| </t> | ||||
| <t> | ||||
| Since such a situation is no longer | ||||
| considered to constitute unavailability of a file system | ||||
| instance, the description has been changed, even though the set of circumstances in | ||||
| which it is to be returned remains the same. | ||||
| The new paragraph explicitly recognizes that a different network | ||||
| address might be used, while the previous description, misleadingly, | ||||
| treated this as a shift between two replicas while only a single | ||||
| file system instance might be involved. The updated description | ||||
| appears in <xref target="err_MOVED" format="default"/>. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| Because of the need to accommodate the use of fs-specific grace periods, | ||||
| it was necessary to clarify some of the definitions of | ||||
| reclaim-related errors in Section | ||||
| <xref target="RFC5661" sectionFormat="bare" section="15"/> of RFC 5661 | ||||
| <xref target="RFC5661"/> | ||||
| so that the text applies properly to reclaims for all types of grace | ||||
| periods. The updated descriptions | ||||
| appear within <xref target="errors_reclaim" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| Because of the need to provide the clarifications in errata | ||||
| report 2006 <xref target="Err2006" format="default"/> | ||||
| and to adapt these to properly explain the interaction of | ||||
| NFS4ERR_DELAY with the reply cache, a revised description | ||||
| of NFS4ERR_DELAY appears in <xref target="err_DELAY" format="default"/>. This | ||||
| errata report, unlike many other RFC 5661 errata reports, is | ||||
| addressed in this | ||||
| document because of the extensive use of NFS4ERR_DELAY | ||||
| in connection with state migration and session migration. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section anchor="CHG-other" numbered="true" toc="default"> | ||||
| <name>Other Revisions Made to RFC 5661</name> | ||||
| <t> | ||||
| Besides the major reworking of Section <xref target="RFC5661" sectionFormat="bare" section="11"/> of RFC 5661 <xref target="RFC5661"/> and the associated revisions to | ||||
| existing operations and errors, there were a number of related changes that were necessary: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The summary in Section <xref target="RFC5661" sectionFormat="bare" section="1.7.3.3"/> | ||||
| of RFC 5661 <xref target="RFC5661"/> was revised to reflect the changes made to | ||||
| <xref target="NEW11" format="default"/> above. The updated summary appears as | ||||
| <xref target="PREP-intro" format="default"/> above. | ||||
| </li> | ||||
| <li> | ||||
| The discussion of server scope in Section | ||||
| <xref target="RFC5661" sectionFormat="bare" section="2.10.4"/> of RFC 5661 | ||||
| <xref target="RFC5661"/> was replaced since it | ||||
| appeared to require a level of inter-server coordination | ||||
| incompatible with its basic function of avoiding the need for | ||||
| a globally uniform means of assigning server_owner values. | ||||
| A revised treatment appears in <xref target="Server_Scope" format="default"/>. | ||||
| </li> | ||||
| <li> | ||||
| The discussion of trunking in Section | ||||
| <xref target="RFC5661" sectionFormat="bare" section="2.10.5"/> of RFC 5661 <xref target="RFC5661"/> | ||||
| was revised to more clearly | ||||
| explain the multiple types of trunking support and how the | ||||
| client can be made aware of the existing trunking configuration. | ||||
| In addition, while the last paragraph (exclusive of subsections) of | ||||
| that section dealing with server_owner changes was literally true, | ||||
| it had been a source of confusion. Since the original paragraph could be read as | ||||
| suggesting that such changes be handled nondisruptively, the | ||||
| issue was clarified in the revised <xref target="Trunking" format="default"/>. | ||||
| </li> | ||||
| </ul> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="SECBAD" numbered="true" toc="default"> | ||||
| <name>Security Issues That Need to Be Addressed</name> | ||||
| <t> | ||||
| The following issues in the treatment of security within the NFSv4.1 | ||||
| specification need to be addressed: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The Security Considerations Section of RFC 5661 <xref target="RFC5661" format="default"/> | ||||
| was not written in accordance with RFC 3552 (BCP 72) <xref target="RFC3552" format="default"/>. | ||||
| Of particular concern was the fact that the section | ||||
| did not contain a threat analysis. | ||||
| </li> | ||||
| <li> | ||||
| Initial analysis of the existing security issues with NFSv4.1 has made | ||||
| it likely that a revised Security Considerations section for the | ||||
| existing protocol (one containing a threat analysis) would be likely | ||||
| to conclude that NFSv4.1 does not meet the goal of secure use on the | ||||
| Internet. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| The Security Considerations section of | ||||
| this document (<xref target="SECCON" format="default"/>) has not been thoroughly | ||||
| revised to correct the difficulties mentioned above. Instead, it has been | ||||
| modified to take proper account of issues related to the multi-server | ||||
| namespace features discussed in <xref target="NEW11" format="default"/>, leaving the | ||||
| incomplete discussion and security weaknesses pretty much as they were. | ||||
| </t> | ||||
| <t> | ||||
| The following major security issues need to be addressed in a | ||||
| satisfactory fashion before an updated Security Considerations section | ||||
| can be published as part of a bis document for NFSv4.1: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| <t> | ||||
| The continued use of AUTH_SYS and the security exposures it creates | ||||
| need to be addressed. Addressing this issue must not be limited to | ||||
| the questions of whether the designation of this as <bcp14>OPTIONAL</bcp14> was | ||||
| justified and whether it should be changed. | ||||
| </t> | ||||
| <t> | ||||
| In any event, it may not be possible at this point to correct the | ||||
| security problems created by continued use of AUTH_SYS simply by | ||||
| revising this designation. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The lack of attention within the protocol to the possibility of | ||||
| pervasive monitoring attacks such as those described in RFC 7258 | ||||
| <xref target="RFC7258" format="default"/> (also BCP 188). | ||||
| </t> | ||||
| <t> | ||||
| In that connection, the use of CREATE_SESSION without privacy protection needs to be addressed | ||||
| as it exposes the session ID to view by an attacker. This is worrisome as this is precisely the type | ||||
| of protocol artifact alluded to in RFC 7258, | ||||
| which can enable further mischief on the part of | ||||
| the attacker as it enables denial-of-service attacks that can be | ||||
| executed effectively with only a single, normally low-value, | ||||
| credential, even when RPCSEC_GSS authentication is in use. | ||||
| </t> | ||||
| </li> | ||||
| <li> | ||||
| <t> | ||||
| The lack of effective use of privacy and integrity, even where the | ||||
| infrastructure to support use of RPCSEC_GSS is present, | ||||
| needs to be addressed. | ||||
| </t> | ||||
| <t> | ||||
| In light of the security exposures that | ||||
| this situation creates, it is not enough to define a protocol that | ||||
| could address this problem with the provision of sufficient resources. | ||||
| Instead, what is needed is a way to provide the necessary security | ||||
| with very limited performance costs and without requiring | ||||
| security infrastructure, which experience has shown is difficult for | ||||
| many clients and servers to provide. | ||||
| </t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| In trying to provide a major security upgrade for a deployed protocol | ||||
| such as NFSv4.1, the working group and the Internet community are likely | ||||
| to find themselves dealing with a number of considerations such as the | ||||
| following: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| The need to accommodate existing deployments of protocols | ||||
| specified previously in existing Proposed Standards. | ||||
| </li> | ||||
| <li> | ||||
| The difficulty of effecting changes to existing, interoperating | ||||
| implementations. | ||||
| </li> | ||||
| <li> | ||||
| The difficulty of making changes to NFSv4 protocols other than those in | ||||
| the form of <bcp14>OPTIONAL</bcp14> extensions. | ||||
| </li> | ||||
| <li> | ||||
| The tendency of those responsible for existing NFSv4 deployments to | ||||
| ignore security flaws in the context of local area networks under | ||||
| the mistaken impression that network isolation provides, in and of itself, isolation from | ||||
| all potential attackers. | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| Given that the above-mentioned difficulties apply to minor | ||||
| version zero as well, it may make sense to deal with these security issues | ||||
| in a common document that applies to all NFSv4 minor versions. If | ||||
| that approach is taken, the Security Considerations section of an eventual NFv4.1 bis | ||||
| document would reference that common document, and the defining | ||||
| RFCs for other minor versions might do so as well. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="false" toc="default"> | ||||
| <name>Acknowledgments</name> | ||||
| <section toc="exclude" numbered="false"> | ||||
| <name>Acknowledgments for This Update</name> | ||||
| <t> | ||||
| The authors wish to acknowledge the important role | ||||
| of <contact fullname="Andy Adamson"/> of Netapp | ||||
| in clarifying the need for trunking discovery functionality, and | ||||
| exploring the role of the file system location attributes in | ||||
| providing the | ||||
| necessary support. | ||||
| </t> | ||||
| <t> | ||||
| The authors wish to thank <contact fullname="Tom Haynes"/> of Hammerspace for drawing our | ||||
| attention to the fact that internationalization and security might | ||||
| best be handled in documents dealing with such protocol issues as they | ||||
| apply to all NFSv4 minor versions. | ||||
| </t> | ||||
| <t> | ||||
| The authors also wish to acknowledge the work of <contact fullname="Xuan Qi"/> of Oracle | ||||
| with NFSv4.1 client and server prototypes of Transparent State | ||||
| Migration functionality. | ||||
| </t> | ||||
| <t> | ||||
| The authors wish to thank others that brought attention to important | ||||
| issues. The comments of <contact fullname="Trond Myklebust"/> of Primary Data related | ||||
| to trunking helped to clarify the role of DNS in | ||||
| trunking discovery. <contact fullname="Rick Macklem"/>'s comments brought attention to | ||||
| problems in the handling of the per-fs version of | ||||
| RECLAIM_COMPLETE. | ||||
| </t> | ||||
| <t> | ||||
| The authors wish to thank <contact fullname="Olga Kornievskaia"/> of Netapp for her helpful | ||||
| review comments. | ||||
| </t> | ||||
| </section> | ||||
| <section toc="exclude" numbered="false"> | ||||
| <name>Acknowledgments for RFC 5661</name> | ||||
| <t> | ||||
| The initial text for the SECINFO extensions were edited by | ||||
| <contact fullname="Mike Eisler"/> with contributions from <contact fullname="Peng Dai"/>, <contact fullname="Sergey Klyushin"/>, and | ||||
| <contact fullname="Carl Burnett"/>. | ||||
| </t> | ||||
| <t> | ||||
| The initial text for the SESSIONS extensions were edited by | ||||
| <contact fullname="Tom Talpey"/>, <contact fullname="Spencer Shepler"/>, | ||||
| <contact fullname="Jon Bauman"/> with contributions from | ||||
| <contact fullname="Charles Antonelli"/>, <contact fullname="Brent Callaghan"/>, <contact fullname="Mike Eisler"/>, <contact fullname="John Howard"/>, <contact fullname="Chet Juszczak"/>, <contact fullname="Trond Myklebust"/>, <contact fullname="Dave Noveck"/>, <contact fullname="John Scott"/>, <contact fullname="Mike Stolarchuk"/>, and <contact fullname="Mark Wittle"/>. | ||||
| </t> | ||||
| <t> | ||||
| Initial text relating to multi-server namespace features, | ||||
| including the concept of referrals, were contributed by | ||||
| <contact fullname="Dave Noveck"/>, <contact fullname="Carl Burnett"/>, | ||||
| and <contact fullname="Charles Fan"/> with contributions | ||||
| from <contact fullname="Ted Anderson"/>, <contact fullname="Neil Brown"/>, and <contact fullname="Jon Haswell"/>. | ||||
| </t> | ||||
| <t> | ||||
| The initial text for the Directory Delegations support were | ||||
| contributed by <contact fullname="Saadia Khan"/> with input from | ||||
| <contact fullname="Dave Noveck"/>, <contact fullname="Mike Eisler"/>, | ||||
| <contact fullname="Carl Burnett"/>, <contact fullname="Ted Anderson"/>, | ||||
| and <contact fullname="Tom Talpey"/>. | ||||
| </t> | ||||
| <t> | ||||
| The initial text for the ACL explanations were contributed by | ||||
| <contact fullname="Sam Falkner"/> and <contact fullname="Lisa Week"/>. | ||||
| </t> | ||||
| <t> | ||||
| The pNFS work was inspired by the NASD and OSD | ||||
| work done by <contact fullname="Garth Gibson"/>. <contact fullname="Gary Grider"/> has also | ||||
| been a champion of high-performance parallel I/O. | ||||
| <contact fullname="Garth Gibson"/> and <contact fullname="Peter Corbett"/> started the pNFS | ||||
| effort with a problem statement document for the IETF | ||||
| that formed the basis for the pNFS work in NFSv4.1. | ||||
| </t> | ||||
| <t> | ||||
| The initial text for the parallel NFS support was edited by | ||||
| <contact fullname="Brent Welch"/> and <contact fullname="Garth Goodson"/>. Additional authors for those | ||||
| documents were <contact fullname="Benny Halevy"/>, <contact fullname="David Black"/>, and <contact fullname="Andy Adamson"/>. | ||||
| Additional input came from the informal group that contributed | ||||
| to the construction of the initial pNFS drafts; specific | ||||
| acknowledgment goes to <contact fullname="Gary Grider"/>, <contact fullname="Peter Corbett"/>, <contact fullname="Dave Noveck"/>, | ||||
| <contact fullname="Peter Honeyman"/>, and <contact fullname="Stephen Fridella"/>. | ||||
| </t> | ||||
| <t> | ||||
| <contact fullname="Fredric Isaman"/> found several errors in draft versions of the | ||||
| ONC RPC XDR description of the NFSv4.1 protocol. | ||||
| </t> | ||||
| <t> | ||||
| <contact fullname="Audrey Van Belleghem"/> provided, in numerous ways, essential | ||||
| coordination and management of the process of editing the | ||||
| specification documents. | ||||
| </t> | ||||
| <t> | ||||
| <contact fullname="Richard Jernigan"/> gave feedback on the file layout's striping | ||||
| pattern design. | ||||
| </t> | ||||
| <t> | ||||
| Several formal inspection teams were formed to review various | ||||
| areas of the protocol. All the inspections found significant | ||||
| errors and room for improvement. NFSv4.1's inspection teams | ||||
| were: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li><t> | ||||
| ACLs, with the following inspectors: | ||||
| <contact fullname="Sam Falkner"/>, | ||||
| <contact fullname="Bruce Fields"/>, | ||||
| <contact fullname="Rahul Iyer"/>, | ||||
| <contact fullname="Saadia Khan"/>, | ||||
| <contact fullname="Dave Noveck"/>, | ||||
| <contact fullname="Lisa Week"/>, | ||||
| <contact fullname="Mario Wurzl"/>, | ||||
| and | ||||
| <contact fullname="Alan Yoder"/>.</t> | ||||
| </li> | ||||
| <li><t> | ||||
| Sessions, with the following inspectors: | ||||
| <contact fullname="William Brown"/>, | ||||
| <contact fullname="Tom Doeppner"/>, | ||||
| <contact fullname="Robert Gordon"/>, | ||||
| <contact fullname="Benny Halevy"/>, | ||||
| <contact fullname="Fredric Isaman"/>, | ||||
| <contact fullname="Rick Macklem"/>, | ||||
| <contact fullname="Trond Myklebust"/>, | ||||
| <contact fullname="Dave Noveck"/>, | ||||
| <contact fullname="Karen Rochford"/>, | ||||
| <contact fullname="John Scott"/>, | ||||
| and | ||||
| <contact fullname="Peter Shah"/>.</t> | ||||
| </li> | ||||
| <li><t> | ||||
| Initial pNFS inspection, with the following inspectors: | ||||
| <contact fullname="Andy Adamson"/>, | ||||
| <contact fullname="David Black"/>, | ||||
| <contact fullname="Mike Eisler"/>, | ||||
| <contact fullname="Marc Eshel"/>, | ||||
| <contact fullname="Sam Falkner"/>, | ||||
| <contact fullname="Garth Goodson"/>, | ||||
| <contact fullname="Benny Halevy"/>, | ||||
| <contact fullname="Rahul Iyer"/>, | ||||
| <contact fullname="Trond Myklebust"/>, | ||||
| <contact fullname="Spencer Shepler"/>, | ||||
| and | ||||
| <contact fullname="Lisa Week"/>.</t> | ||||
| </li> | ||||
| <li><t> | ||||
| Global namespace, with the following inspectors: | ||||
| <contact fullname="Mike Eisler"/>, | ||||
| <contact fullname="Dan Ellard"/>, | ||||
| <contact fullname="Craig Everhart"/>, | ||||
| <contact fullname="Fredric Isaman"/>, | ||||
| <contact fullname="Trond Myklebust"/>, | ||||
| <contact fullname="Dave Noveck"/>, | ||||
| <contact fullname="Theresa Raj"/>, | ||||
| <contact fullname="Spencer Shepler"/>, | ||||
| <contact fullname="Renu Tewari"/>, | ||||
| and | ||||
| <contact fullname="Robert Thurlow"/>.</t> | ||||
| </li> | ||||
| <li><t> | ||||
| NFSv4.1 file layout type, with the following inspectors: | ||||
| <contact fullname="Andy Adamson"/>, | ||||
| <contact fullname="Marc Eshel"/>, | ||||
| <contact fullname="Sam Falkner"/>, | ||||
| <contact fullname="Garth Goodson"/>, | ||||
| <contact fullname="Rahul Iyer"/>, | ||||
| <contact fullname="Trond Myklebust"/>, | ||||
| and | ||||
| <contact fullname="Lisa Week"/>.</t> | ||||
| </li> | ||||
| <li><t> | ||||
| NFSv4.1 locking and directory delegations, with the following inspectors: | ||||
| <contact fullname="Mike Eisler"/>, | ||||
| <contact fullname="Pranoop Erasani"/>, | ||||
| <contact fullname="Robert Gordon"/>, | ||||
| <contact fullname="Saadia Khan"/>, | ||||
| <contact fullname="Eric Kustarz"/>, | ||||
| <contact fullname="Dave Noveck"/>, | ||||
| <contact fullname="Spencer Shepler"/>, | ||||
| and | ||||
| <contact fullname="Amy Weaver"/>.</t> | ||||
| </li> | ||||
| <li><t> | ||||
| EXCHANGE_ID and DESTROY_CLIENTID, with the following inspectors: | ||||
| <contact fullname="Mike Eisler"/>, | ||||
| <contact fullname="Pranoop Erasani"/>, | ||||
| <contact fullname="Robert Gordon"/>, | ||||
| <contact fullname="Benny Halevy"/>, | ||||
| <contact fullname="Fredric Isaman"/>, | ||||
| <contact fullname="Saadia Khan"/>, | ||||
| <contact fullname="Ricardo Labiaga"/>, | ||||
| <contact fullname="Rick Macklem"/>, | ||||
| <contact fullname="Trond Myklebust"/>, | ||||
| <contact fullname="Spencer Shepler"/>, | ||||
| and | ||||
| <contact fullname="Brent Welch"/>.</t> | ||||
| </li> | ||||
| <li><t> | ||||
| Final pNFS inspection, with the following inspectors: | ||||
| <contact fullname="Andy Adamson"/>, | ||||
| <contact fullname="Mike Eisler"/>, | ||||
| <contact fullname="Mark Eshel"/>, | ||||
| <contact fullname="Sam Falkner"/>, | ||||
| <contact fullname="Jason Glasgow"/>, | ||||
| <contact fullname="Garth Goodson"/>, | ||||
| <contact fullname="Robert Gordon"/>, | ||||
| <contact fullname="Benny Halevy"/>, | ||||
| <contact fullname="Dean Hildebrand"/>, | ||||
| <contact fullname="Rahul Iyer"/>, | ||||
| <contact fullname="Suchit Kaura"/>, | ||||
| <contact fullname="Trond Myklebust"/>, | ||||
| <contact fullname="Anatoly Pinchuk"/>, | ||||
| <contact fullname="Spencer Shepler"/>, | ||||
| <contact fullname="Renu Tewari"/>, | ||||
| <contact fullname="Lisa Week"/>, | ||||
| and | ||||
| <contact fullname="Brent Welch"/>.</t> | ||||
| </li> | ||||
| </ul> | ||||
| <t> | ||||
| A review team worked together to generate the tables of assignments of | ||||
| error sets to operations and make sure that each such assignment had | ||||
| two or more people validating it. Participating in the process were | ||||
| <contact fullname="Andy Adamson"/>, | ||||
| <contact fullname="Mike Eisler"/>, | ||||
| <contact fullname="Sam Falkner"/>, | ||||
| <contact fullname="Garth Goodson"/>, | ||||
| <contact fullname="Robert Gordon"/>, | ||||
| <contact fullname="Trond Myklebust"/>, | ||||
| <contact fullname="Dave Noveck"/>, | ||||
| <contact fullname="Spencer Shepler"/>, | ||||
| <contact fullname="Tom Talpey"/>, | ||||
| <contact fullname="Amy Weaver"/>, | ||||
| and | ||||
| <contact fullname="Lisa Week"/>. | ||||
| </t> | ||||
| <t> | ||||
| <contact fullname="Jari Arkko"/>, <contact fullname="David Black"/>, | ||||
| <contact fullname="Scott Bradner"/>, <contact fullname="Lisa Dusseault"/>, <contact fullname="Lars Eggert"/>, <contact fullname="Chris Newman"/>, and <contact fullname="Tim Polk"/> provided valuable review and guidance. | ||||
| </t> | ||||
| <t> | ||||
| <contact fullname="Olga Kornievskaia"/> found several errors in the SSV specification. | ||||
| </t> | ||||
| <t> | ||||
| <contact fullname="Ricardo Labiaga"/> found several places where the use of RPCSEC_GSS | ||||
| was underspecified. | ||||
| </t> | ||||
| <t> | ||||
| Those who provided miscellaneous comments include: | ||||
| <contact fullname="Andy Adamson"/>, <contact fullname="Sunil Bhargo"/>, | ||||
| <contact fullname="Alex Burlyga"/>, <contact fullname="Pranoop Erasani"/>, | ||||
| <contact fullname="Bruce Fields"/>, <contact fullname="Vadim Finkelstein"/>, <contact fullname="Jason Goldschmidt"/>, <contact fullname="Vijay K. Gurbani"/>, <contact fullname="Sergey Klyushin"/>, <contact fullname="Ricardo Labiaga"/>, <contact fullname="James Lentini"/>, <contact fullname="Anshul Madan"/>, <contact fullname="Daniel Muntz"/>, <contact fullname="Daniel Picken"/>, <contact fullname="Archana Ramani"/>, <contact fullname="Jim Rees"/>, <contact fullname="Mahesh Siddheshwar"/>, <contact fullname="Tom Talpey"/>, and <contact fullname="Peter Varga"/>. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| </back> | ||||
| </rfc> | ||||
| End of changes. 1 change blocks. | ||||
| lines changed or deleted | lines changed or added | |||
This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||