big-one Cdiff usr/src/uts/common/fs/nfs/nfs4

Print this page

NEX-15740 NFS deadlock in rfs4_compound with hundreds of threads waiting for lock owned by rfs4_op_rename (lint fix)
NEX-15740 NFS deadlock in rfs4_compound with hundreds of threads waiting for lock owned by rfs4_op_rename
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-16917 Need to reduce the impact of NFS per-share kstats on failover
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-16835 Kernel panic during BDD tests at rfs4_compound func
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-15924 Getting panic: BAD TRAP: type=d (#gp General protection) rp=ffffff0021464690 addr=12
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-16812 Timing window where dtrace probe could try to access share info after unshared
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-16452 NFS server in a zone state database needs to be per zone
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-15279 support NFS server in zone
NEX-15520 online NFS shares cause zoneadm halt to hang in nfs_export_zone_fini
Portions contributed by: Dan Kruchinin dan.kruchinin@nexenta.com
Portions contributed by: Stepan Zastupov stepan.zastupov@gmail.com
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
NEX-9275 Got "bad mutex" panic when run IO to nfs share from clients
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
NEX-7366 Getting panic in "module "nfssrv" due to a NULL pointer dereference" when updating NFS shares on a pool
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Steve Peng <steve.peng@nexenta.com>
NEX-6778 NFS kstats leak and cause system to hang
Revert "NEX-4261 Per-client NFS server IOPS, bandwidth, and latency kstats"
This reverts commit 586c3ab1927647487f01c337ddc011c642575a52.
Revert "NEX-5354 Aggregated IOPS, bandwidth, and latency kstats for NFS server"
This reverts commit c91d7614da8618ef48018102b077f60ecbbac8c2.
Revert "NEX-5667 nfssrv_stats_flags does not work for aggregated kstats"
This reverts commit 3dcf42618be7dd5f408c327f429c81e07ca08e74.
Revert "NEX-5750 Time values for aggregated NFS server kstats should be normalized"
This reverts commit 1f4d4f901153b0191027969fa4a8064f9d3b9ee1.
Revert "NEX-5942 Panic in rfs4_minorvers_mismatch() with NFSv4.1 client"
This reverts commit 40766417094a162f5e4cc8786c0fa0a7e5871cd9.
Revert "NEX-5752 NFS server: namespace collision in kstats"
This reverts commit ae81e668db86050da8e483264acb0cce0444a132.
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-6109 NFS client panics in nfssrv when running nfsv4-test basic_ops STC tests
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Jean McCormack <jean.mccormack@nexenta.com>
Reviewed by: Steve Peng <steve.peng@nexenta.com>
NEX-4261 Per-client NFS server IOPS, bandwidth, and latency kstats
Reviewed by: Kevin Crowe <kevin.crowe@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-5134 Deadlock between rfs4_do_lock() and rfs4_op_read()
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
NEX-3311 NFSv4: setlock() can spin forever
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
NEX-3097 IOPS, bandwidth, and latency kstats for NFS server
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-1128 NFS server: Generic uid and gid remapping for AUTH_SYS
Reviewed by: Jan Kryl <jan.kryl@nexenta.com>
OS-72 NULL pointer dereference in rfs4_op_setclientid()
Reviewed by: Dan McDonald <danmcd@nexenta.com>


*** 18,37 ****
   *
   * CDDL HEADER END
   */
  
  /*
-  * Copyright 2016 Nexenta Systems, Inc.  All rights reserved.
   * Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
-  * Copyright (c) 2012, 2016 by Delphix. All rights reserved.
   */
  
  /*
   *      Copyright (c) 1983,1984,1985,1986,1987,1988,1989  AT&T.
   *      All Rights Reserved
   */
  
  #include <sys/param.h>
  #include <sys/types.h>
  #include <sys/systm.h>
  #include <sys/cred.h>
  #include <sys/buf.h>
--- 18,40 ----
   *
   * CDDL HEADER END
   */
  
  /*
   * Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
   */
  
  /*
   *      Copyright (c) 1983,1984,1985,1986,1987,1988,1989  AT&T.
   *      All Rights Reserved
   */
  
+ /*
+  * Copyright 2019 Nexenta Systems, Inc.
+  * Copyright (c) 2012, 2016 by Delphix. All rights reserved.
+  */
+ 
  #include <sys/param.h>
  #include <sys/types.h>
  #include <sys/systm.h>
  #include <sys/cred.h>
  #include <sys/buf.h>
*** 55,77 ****
--- 58,83 ----
  #include <sys/policy.h>
  #include <sys/fem.h>
  #include <sys/sdt.h>
  #include <sys/ddi.h>
  #include <sys/zone.h>
+ #include <sys/kstat.h>
  
  #include <fs/fs_reparse.h>
  
  #include <rpc/types.h>
  #include <rpc/auth.h>
  #include <rpc/rpcsec_gss.h>
  #include <rpc/svc.h>
  
  #include <nfs/nfs.h>
+ #include <nfs/nfssys.h>
  #include <nfs/export.h>
  #include <nfs/nfs_cmd.h>
  #include <nfs/lm.h>
  #include <nfs/nfs4.h>
+ #include <nfs/nfs4_drc.h>
  
  #include <sys/strsubr.h>
  #include <sys/strsun.h>
  
  #include <inet/common.h>
*** 145,164 ****
   *
   */
  #define DIRENT64_TO_DIRCOUNT(dp) \
          (3 * BYTES_PER_XDR_UNIT + DIRENT64_NAMELEN((dp)->d_reclen))
  
! time_t rfs4_start_time;                 /* Initialized in rfs4_srvrinit */
  
  static sysid_t lockt_sysid;             /* dummy sysid for all LOCKT calls */
  
  u_longlong_t    nfs4_srv_caller_id;
  uint_t          nfs4_srv_vkey = 0;
  
- verifier4       Write4verf;
- verifier4       Readdir4verf;
- 
  void    rfs4_init_compound_state(struct compound_state *);
  
  static void     nullfree(caddr_t);
  static void     rfs4_op_inval(nfs_argop4 *, nfs_resop4 *, struct svc_req *,
                          struct compound_state *);
--- 151,167 ----
   *
   */
  #define DIRENT64_TO_DIRCOUNT(dp) \
          (3 * BYTES_PER_XDR_UNIT + DIRENT64_NAMELEN((dp)->d_reclen))
  
! zone_key_t      rfs4_zone_key;
  
  static sysid_t          lockt_sysid;    /* dummy sysid for all LOCKT calls */
  
  u_longlong_t    nfs4_srv_caller_id;
  uint_t          nfs4_srv_vkey = 0;
  
  void    rfs4_init_compound_state(struct compound_state *);
  
  static void     nullfree(caddr_t);
  static void     rfs4_op_inval(nfs_argop4 *, nfs_resop4 *, struct svc_req *,
                      struct compound_state *);
*** 243,257 ****
                          struct svc_req *req, struct compound_state *);
  static void     rfs4_op_secinfo(nfs_argop4 *, nfs_resop4 *, struct svc_req *,
                          struct compound_state *);
  static void     rfs4_op_secinfo_free(nfs_resop4 *);
  
! static nfsstat4 check_open_access(uint32_t,
!                                 struct compound_state *, struct svc_req *);
  nfsstat4 rfs4_client_sysid(rfs4_client_t *, sysid_t *);
! void rfs4_ss_clid(rfs4_client_t *);
  
  /*
   * translation table for attrs
   */
  struct nfs4_ntov_table {
          union nfs4_attr_u *na;
--- 246,261 ----
                      struct svc_req *req, struct compound_state *);
  static void     rfs4_op_secinfo(nfs_argop4 *, nfs_resop4 *, struct svc_req *,
                      struct compound_state *);
  static void     rfs4_op_secinfo_free(nfs_resop4 *);
  
! static nfsstat4 check_open_access(uint32_t, struct compound_state *,
!                     struct svc_req *);
  nfsstat4        rfs4_client_sysid(rfs4_client_t *, sysid_t *);
! void            rfs4_ss_clid(nfs4_srv_t *, rfs4_client_t *);
  
+ 
  /*
   * translation table for attrs
   */
  struct nfs4_ntov_table {
          union nfs4_attr_u *na;
*** 266,416 ****
  
  static nfsstat4 do_rfs4_set_attrs(bitmap4 *resp, fattr4 *fattrp,
                      struct compound_state *cs, struct nfs4_svgetit_arg *sargp,
                      struct nfs4_ntov_table *ntovp, nfs4_attr_cmd_t cmd);
  
  fem_t           *deleg_rdops;
  fem_t           *deleg_wrops;
  
- rfs4_servinst_t *rfs4_cur_servinst = NULL;      /* current server instance */
- kmutex_t        rfs4_servinst_lock;     /* protects linked list */
- int             rfs4_seen_first_compound;       /* set first time we see one */
- 
  /*
   * NFS4 op dispatch table
   */
  
  struct rfsv4disp {
          void    (*dis_proc)();          /* proc to call */
          void    (*dis_resfree)();       /* frees space allocated by proc */
          int     dis_flags;              /* RPC_IDEMPOTENT, etc... */
  };
  
  static struct rfsv4disp rfsv4disptab[] = {
          /*
           * NFS VERSION 4
           */
  
          /* RFS_NULL = 0 */
!         {rfs4_op_illegal, nullfree, 0},
  
          /* UNUSED = 1 */
!         {rfs4_op_illegal, nullfree, 0},
  
          /* UNUSED = 2 */
!         {rfs4_op_illegal, nullfree, 0},
  
          /* OP_ACCESS = 3 */
!         {rfs4_op_access, nullfree, RPC_IDEMPOTENT},
  
          /* OP_CLOSE = 4 */
!         {rfs4_op_close, nullfree, 0},
  
          /* OP_COMMIT = 5 */
!         {rfs4_op_commit, nullfree, RPC_IDEMPOTENT},
  
          /* OP_CREATE = 6 */
!         {rfs4_op_create, nullfree, 0},
  
          /* OP_DELEGPURGE = 7 */
!         {rfs4_op_delegpurge, nullfree, 0},
  
          /* OP_DELEGRETURN = 8 */
!         {rfs4_op_delegreturn, nullfree, 0},
  
          /* OP_GETATTR = 9 */
!         {rfs4_op_getattr, rfs4_op_getattr_free, RPC_IDEMPOTENT},
  
          /* OP_GETFH = 10 */
!         {rfs4_op_getfh, rfs4_op_getfh_free, RPC_ALL},
  
          /* OP_LINK = 11 */
!         {rfs4_op_link, nullfree, 0},
  
          /* OP_LOCK = 12 */
!         {rfs4_op_lock, lock_denied_free, 0},
  
          /* OP_LOCKT = 13 */
!         {rfs4_op_lockt, lock_denied_free, 0},
  
          /* OP_LOCKU = 14 */
!         {rfs4_op_locku, nullfree, 0},
  
          /* OP_LOOKUP = 15 */
!         {rfs4_op_lookup, nullfree, (RPC_IDEMPOTENT | RPC_PUBLICFH_OK)},
  
          /* OP_LOOKUPP = 16 */
!         {rfs4_op_lookupp, nullfree, (RPC_IDEMPOTENT | RPC_PUBLICFH_OK)},
  
          /* OP_NVERIFY = 17 */
!         {rfs4_op_nverify, nullfree, RPC_IDEMPOTENT},
  
          /* OP_OPEN = 18 */
!         {rfs4_op_open, rfs4_free_reply, 0},
  
          /* OP_OPENATTR = 19 */
!         {rfs4_op_openattr, nullfree, 0},
  
          /* OP_OPEN_CONFIRM = 20 */
!         {rfs4_op_open_confirm, nullfree, 0},
  
          /* OP_OPEN_DOWNGRADE = 21 */
!         {rfs4_op_open_downgrade, nullfree, 0},
  
          /* OP_OPEN_PUTFH = 22 */
!         {rfs4_op_putfh, nullfree, RPC_ALL},
  
          /* OP_PUTPUBFH = 23 */
!         {rfs4_op_putpubfh, nullfree, RPC_ALL},
  
          /* OP_PUTROOTFH = 24 */
!         {rfs4_op_putrootfh, nullfree, RPC_ALL},
  
          /* OP_READ = 25 */
!         {rfs4_op_read, rfs4_op_read_free, RPC_IDEMPOTENT},
  
          /* OP_READDIR = 26 */
!         {rfs4_op_readdir, rfs4_op_readdir_free, RPC_IDEMPOTENT},
  
          /* OP_READLINK = 27 */
!         {rfs4_op_readlink, rfs4_op_readlink_free, RPC_IDEMPOTENT},
  
          /* OP_REMOVE = 28 */
!         {rfs4_op_remove, nullfree, 0},
  
          /* OP_RENAME = 29 */
!         {rfs4_op_rename, nullfree, 0},
  
          /* OP_RENEW = 30 */
!         {rfs4_op_renew, nullfree, 0},
  
          /* OP_RESTOREFH = 31 */
!         {rfs4_op_restorefh, nullfree, RPC_ALL},
  
          /* OP_SAVEFH = 32 */
!         {rfs4_op_savefh, nullfree, RPC_ALL},
  
          /* OP_SECINFO = 33 */
!         {rfs4_op_secinfo, rfs4_op_secinfo_free, 0},
  
          /* OP_SETATTR = 34 */
!         {rfs4_op_setattr, nullfree, 0},
  
          /* OP_SETCLIENTID = 35 */
!         {rfs4_op_setclientid, nullfree, 0},
  
          /* OP_SETCLIENTID_CONFIRM = 36 */
!         {rfs4_op_setclientid_confirm, nullfree, 0},
  
          /* OP_VERIFY = 37 */
!         {rfs4_op_verify, nullfree, RPC_IDEMPOTENT},
  
          /* OP_WRITE = 38 */
!         {rfs4_op_write, nullfree, 0},
  
          /* OP_RELEASE_LOCKOWNER = 39 */
!         {rfs4_op_release_lockowner, nullfree, 0},
  };
  
  static uint_t rfsv4disp_cnt = sizeof (rfsv4disptab) / sizeof (rfsv4disptab[0]);
  
  #define OP_ILLEGAL_IDX (rfsv4disp_cnt)
--- 270,452 ----
  
  static nfsstat4 do_rfs4_set_attrs(bitmap4 *resp, fattr4 *fattrp,
                      struct compound_state *cs, struct nfs4_svgetit_arg *sargp,
                      struct nfs4_ntov_table *ntovp, nfs4_attr_cmd_t cmd);
  
+ static void     hanfsv4_failover(nfs4_srv_t *);
+ 
  fem_t           *deleg_rdops;
  fem_t           *deleg_wrops;
  
  /*
   * NFS4 op dispatch table
   */
  
  struct rfsv4disp {
          void    (*dis_proc)();          /* proc to call */
          void    (*dis_resfree)();       /* frees space allocated by proc */
          int     dis_flags;              /* RPC_IDEMPOTENT, etc... */
+         int     op_type;                /* operation type, see below */
  };
  
+ /*
+  * operation types; used primarily for the per-exportinfo kstat implementation
+  */
+ #define NFS4_OP_NOFH    0       /* The operation does not operate with any */
+                                 /* particular filehandle; we cannot associate */
+                                 /* it with any exportinfo. */
+ 
+ #define NFS4_OP_CFH     1       /* The operation works with the current */
+                                 /* filehandle; we associate the operation */
+                                 /* with the exportinfo related to the current */
+                                 /* filehandle (as set before the operation is */
+                                 /* executed). */
+ 
+ #define NFS4_OP_SFH     2       /* The operation works with the saved */
+                                 /* filehandle; we associate the operation */
+                                 /* with the exportinfo related to the saved */
+                                 /* filehandle (as set before the operation is */
+                                 /* executed). */
+ 
+ #define NFS4_OP_POSTCFH 3       /* The operation ignores the current */
+                                 /* filehandle, but sets the new current */
+                                 /* filehandle instead; we associate the */
+                                 /* operation with the exportinfo related to */
+                                 /* the current filehandle as set after the */
+                                 /* operation is successfuly executed.  Since */
+                                 /* we do not know the particular exportinfo */
+                                 /* (and thus the kstat) before the operation */
+                                 /* is done, there is no simple way how to */
+                                 /* update some I/O kstat statistics related */
+                                 /* to kstat_queue(9F). */
+ 
  static struct rfsv4disp rfsv4disptab[] = {
          /*
           * NFS VERSION 4
           */
  
          /* RFS_NULL = 0 */
!         {rfs4_op_illegal, nullfree, 0, NFS4_OP_NOFH},
  
          /* UNUSED = 1 */
!         {rfs4_op_illegal, nullfree, 0, NFS4_OP_NOFH},
  
          /* UNUSED = 2 */
!         {rfs4_op_illegal, nullfree, 0, NFS4_OP_NOFH},
  
          /* OP_ACCESS = 3 */
!         {rfs4_op_access, nullfree, RPC_IDEMPOTENT, NFS4_OP_CFH},
  
          /* OP_CLOSE = 4 */
!         {rfs4_op_close, nullfree, 0, NFS4_OP_CFH},
  
          /* OP_COMMIT = 5 */
!         {rfs4_op_commit, nullfree, RPC_IDEMPOTENT, NFS4_OP_CFH},
  
          /* OP_CREATE = 6 */
!         {rfs4_op_create, nullfree, 0, NFS4_OP_CFH},
  
          /* OP_DELEGPURGE = 7 */
!         {rfs4_op_delegpurge, nullfree, 0, NFS4_OP_NOFH},
  
          /* OP_DELEGRETURN = 8 */
!         {rfs4_op_delegreturn, nullfree, 0, NFS4_OP_CFH},
  
          /* OP_GETATTR = 9 */
!         {rfs4_op_getattr, rfs4_op_getattr_free, RPC_IDEMPOTENT, NFS4_OP_CFH},
  
          /* OP_GETFH = 10 */
!         {rfs4_op_getfh, rfs4_op_getfh_free, RPC_ALL, NFS4_OP_CFH},
  
          /* OP_LINK = 11 */
!         {rfs4_op_link, nullfree, 0, NFS4_OP_CFH},
  
          /* OP_LOCK = 12 */
!         {rfs4_op_lock, lock_denied_free, 0, NFS4_OP_CFH},
  
          /* OP_LOCKT = 13 */
!         {rfs4_op_lockt, lock_denied_free, 0, NFS4_OP_CFH},
  
          /* OP_LOCKU = 14 */
!         {rfs4_op_locku, nullfree, 0, NFS4_OP_CFH},
  
          /* OP_LOOKUP = 15 */
!         {rfs4_op_lookup, nullfree, (RPC_IDEMPOTENT | RPC_PUBLICFH_OK),
!             NFS4_OP_CFH},
  
          /* OP_LOOKUPP = 16 */
!         {rfs4_op_lookupp, nullfree, (RPC_IDEMPOTENT | RPC_PUBLICFH_OK),
!             NFS4_OP_CFH},
  
          /* OP_NVERIFY = 17 */
!         {rfs4_op_nverify, nullfree, RPC_IDEMPOTENT, NFS4_OP_CFH},
  
          /* OP_OPEN = 18 */
!         {rfs4_op_open, rfs4_free_reply, 0, NFS4_OP_CFH},
  
          /* OP_OPENATTR = 19 */
!         {rfs4_op_openattr, nullfree, 0, NFS4_OP_CFH},
  
          /* OP_OPEN_CONFIRM = 20 */
!         {rfs4_op_open_confirm, nullfree, 0, NFS4_OP_CFH},
  
          /* OP_OPEN_DOWNGRADE = 21 */
!         {rfs4_op_open_downgrade, nullfree, 0, NFS4_OP_CFH},
  
          /* OP_OPEN_PUTFH = 22 */
!         {rfs4_op_putfh, nullfree, RPC_ALL, NFS4_OP_POSTCFH},
  
          /* OP_PUTPUBFH = 23 */
!         {rfs4_op_putpubfh, nullfree, RPC_ALL, NFS4_OP_POSTCFH},
  
          /* OP_PUTROOTFH = 24 */
!         {rfs4_op_putrootfh, nullfree, RPC_ALL, NFS4_OP_POSTCFH},
  
          /* OP_READ = 25 */
!         {rfs4_op_read, rfs4_op_read_free, RPC_IDEMPOTENT, NFS4_OP_CFH},
  
          /* OP_READDIR = 26 */
!         {rfs4_op_readdir, rfs4_op_readdir_free, RPC_IDEMPOTENT, NFS4_OP_CFH},
  
          /* OP_READLINK = 27 */
!         {rfs4_op_readlink, rfs4_op_readlink_free, RPC_IDEMPOTENT, NFS4_OP_CFH},
  
          /* OP_REMOVE = 28 */
!         {rfs4_op_remove, nullfree, 0, NFS4_OP_CFH},
  
          /* OP_RENAME = 29 */
!         {rfs4_op_rename, nullfree, 0, NFS4_OP_CFH},
  
          /* OP_RENEW = 30 */
!         {rfs4_op_renew, nullfree, 0, NFS4_OP_NOFH},
  
          /* OP_RESTOREFH = 31 */
!         {rfs4_op_restorefh, nullfree, RPC_ALL, NFS4_OP_SFH},
  
          /* OP_SAVEFH = 32 */
!         {rfs4_op_savefh, nullfree, RPC_ALL, NFS4_OP_CFH},
  
          /* OP_SECINFO = 33 */
!         {rfs4_op_secinfo, rfs4_op_secinfo_free, 0, NFS4_OP_CFH},
  
          /* OP_SETATTR = 34 */
!         {rfs4_op_setattr, nullfree, 0, NFS4_OP_CFH},
  
          /* OP_SETCLIENTID = 35 */
!         {rfs4_op_setclientid, nullfree, 0, NFS4_OP_NOFH},
  
          /* OP_SETCLIENTID_CONFIRM = 36 */
!         {rfs4_op_setclientid_confirm, nullfree, 0, NFS4_OP_NOFH},
  
          /* OP_VERIFY = 37 */
!         {rfs4_op_verify, nullfree, RPC_IDEMPOTENT, NFS4_OP_CFH},
  
          /* OP_WRITE = 38 */
!         {rfs4_op_write, nullfree, 0, NFS4_OP_CFH},
  
          /* OP_RELEASE_LOCKOWNER = 39 */
!         {rfs4_op_release_lockowner, nullfree, 0, NFS4_OP_NOFH},
  };
  
  static uint_t rfsv4disp_cnt = sizeof (rfsv4disptab) / sizeof (rfsv4disptab[0]);
  
  #define OP_ILLEGAL_IDX (rfsv4disp_cnt)
*** 464,474 ****
          "rfs4_op_release_lockowner",
          "rfs4_op_illegal"
  };
  #endif
  
! void    rfs4_ss_chkclid(rfs4_client_t *);
  
  extern size_t   strlcpy(char *dst, const char *src, size_t dstsize);
  
  extern void     rfs4_free_fs_locations4(fs_locations4 *);
  
--- 500,510 ----
          "rfs4_op_release_lockowner",
          "rfs4_op_illegal"
  };
  #endif
  
! void    rfs4_ss_chkclid(nfs4_srv_t *, rfs4_client_t *);
  
  extern size_t   strlcpy(char *dst, const char *src, size_t dstsize);
  
  extern void     rfs4_free_fs_locations4(fs_locations4 *);
  
*** 497,514 ****
          VOPNAME_SETSECATTR,     { .femop_setsecattr = deleg_wr_setsecattr },
          VOPNAME_VNEVENT,        { .femop_vnevent = deleg_wr_vnevent },
          NULL,                   NULL
  };
  
! int
! rfs4_srvrinit(void)
  {
          timespec32_t verf;
-         int error;
-         extern void rfs4_attr_init();
-         extern krwlock_t rfs4_deleg_policy_lock;
  
          /*
           * The following algorithm attempts to find a unique verifier
           * to be used as the write verifier returned from the server
           * to the client.  It is important that this verifier change
           * whenever the server reboots.  Of secondary importance, it
--- 533,551 ----
          VOPNAME_SETSECATTR,     { .femop_setsecattr = deleg_wr_setsecattr },
          VOPNAME_VNEVENT,        { .femop_vnevent = deleg_wr_vnevent },
          NULL,                   NULL
  };
  
! /* ARGSUSED */
! static void *
! rfs4_zone_init(zoneid_t zoneid)
  {
+         nfs4_srv_t *nsrv4;
          timespec32_t verf;
  
+         nsrv4 = kmem_zalloc(sizeof (*nsrv4), KM_SLEEP);
+ 
          /*
           * The following algorithm attempts to find a unique verifier
           * to be used as the write verifier returned from the server
           * to the client.  It is important that this verifier change
           * whenever the server reboots.  Of secondary importance, it
*** 533,605 ****
  
                  gethrestime(&tverf);
                  verf.tv_sec = (time_t)tverf.tv_sec;
                  verf.tv_nsec = tverf.tv_nsec;
          }
  
!         Write4verf = *(uint64_t *)&verf;
  
!         rfs4_attr_init();
!         mutex_init(&rfs4_deleg_lock, NULL, MUTEX_DEFAULT, NULL);
  
!         /* Used to manage create/destroy of server state */
!         mutex_init(&rfs4_state_lock, NULL, MUTEX_DEFAULT, NULL);
  
!         /* Used to manage access to server instance linked list */
!         mutex_init(&rfs4_servinst_lock, NULL, MUTEX_DEFAULT, NULL);
  
!         /* Used to manage access to rfs4_deleg_policy */
!         rw_init(&rfs4_deleg_policy_lock, NULL, RW_DEFAULT, NULL);
  
!         error = fem_create("deleg_rdops", nfs4_rd_deleg_tmpl, &deleg_rdops);
!         if (error != 0) {
                  rfs4_disable_delegation();
!         } else {
!                 error = fem_create("deleg_wrops", nfs4_wr_deleg_tmpl,
!                     &deleg_wrops);
!                 if (error != 0) {
                          rfs4_disable_delegation();
                          fem_free(deleg_rdops);
                  }
-         }
  
          nfs4_srv_caller_id = fs_new_caller_id();
- 
          lockt_sysid = lm_alloc_sysidt();
- 
          vsd_create(&nfs4_srv_vkey, NULL);
! 
!         return (0);
  }
  
  void
  rfs4_srvrfini(void)
  {
-         extern krwlock_t rfs4_deleg_policy_lock;
- 
          if (lockt_sysid != LM_NOSYSID) {
                  lm_free_sysidt(lockt_sysid);
                  lockt_sysid = LM_NOSYSID;
          }
  
!         mutex_destroy(&rfs4_deleg_lock);
!         mutex_destroy(&rfs4_state_lock);
!         rw_destroy(&rfs4_deleg_policy_lock);
  
          fem_free(deleg_rdops);
          fem_free(deleg_wrops);
  }
  
  void
  rfs4_init_compound_state(struct compound_state *cs)
  {
          bzero(cs, sizeof (*cs));
          cs->cont = TRUE;
          cs->access = CS_ACCESS_DENIED;
          cs->deleg = FALSE;
          cs->mandlock = FALSE;
          cs->fh.nfs_fh4_val = cs->fhbuf;
  }
  
  void
  rfs4_grace_start(rfs4_servinst_t *sip)
  {
--- 570,689 ----
  
                  gethrestime(&tverf);
                  verf.tv_sec = (time_t)tverf.tv_sec;
                  verf.tv_nsec = tverf.tv_nsec;
          }
+         nsrv4->write4verf = *(uint64_t *)&verf;
  
!         /* Used to manage create/destroy of server state */
!         nsrv4->nfs4_server_state = NULL;
!         nsrv4->nfs4_cur_servinst = NULL;
!         nsrv4->nfs4_deleg_policy = SRV_NEVER_DELEGATE;
!         mutex_init(&nsrv4->deleg_lock, NULL, MUTEX_DEFAULT, NULL);
!         mutex_init(&nsrv4->state_lock, NULL, MUTEX_DEFAULT, NULL);
!         mutex_init(&nsrv4->servinst_lock, NULL, MUTEX_DEFAULT, NULL);
!         rw_init(&nsrv4->deleg_policy_lock, NULL, RW_DEFAULT, NULL);
  
!         return (nsrv4);
! }
  
! /* ARGSUSED */
! static void
! rfs4_zone_fini(zoneid_t zoneid, void *data)
! {
!         nfs4_srv_t *nsrv4 = data;
  
!         mutex_destroy(&nsrv4->deleg_lock);
!         mutex_destroy(&nsrv4->state_lock);
!         mutex_destroy(&nsrv4->servinst_lock);
!         rw_destroy(&nsrv4->deleg_policy_lock);
  
!         kmem_free(nsrv4, sizeof (*nsrv4));
! }
  
! void
! rfs4_srvrinit(void)
! {
!         extern void rfs4_attr_init();
! 
!         zone_key_create(&rfs4_zone_key, rfs4_zone_init, NULL, rfs4_zone_fini);
! 
!         rfs4_attr_init();
! 
! 
!         if (fem_create("deleg_rdops", nfs4_rd_deleg_tmpl, &deleg_rdops) != 0) {
                  rfs4_disable_delegation();
!         } else if (fem_create("deleg_wrops", nfs4_wr_deleg_tmpl,
!             &deleg_wrops) != 0) {
                  rfs4_disable_delegation();
                  fem_free(deleg_rdops);
          }
  
          nfs4_srv_caller_id = fs_new_caller_id();
          lockt_sysid = lm_alloc_sysidt();
          vsd_create(&nfs4_srv_vkey, NULL);
!         rfs4_state_g_init();
  }
  
  void
  rfs4_srvrfini(void)
  {
          if (lockt_sysid != LM_NOSYSID) {
                  lm_free_sysidt(lockt_sysid);
                  lockt_sysid = LM_NOSYSID;
          }
  
!         rfs4_state_g_fini();
  
          fem_free(deleg_rdops);
          fem_free(deleg_wrops);
+ 
+         (void) zone_key_delete(rfs4_zone_key);
  }
  
  void
+ rfs4_do_server_start(int server_upordown,
+     int srv_delegation, int cluster_booted)
+ {
+         nfs4_srv_t *nsrv4 = zone_getspecific(rfs4_zone_key, curzone);
+ 
+         /* Is this a warm start? */
+         if (server_upordown == NFS_SERVER_QUIESCED) {
+                 cmn_err(CE_NOTE, "nfs4_srv: "
+                     "server was previously quiesced; "
+                     "existing NFSv4 state will be re-used");
+ 
+                 /*
+                  * HA-NFSv4: this is also the signal
+                  * that a Resource Group failover has
+                  * occurred.
+                  */
+                 if (cluster_booted)
+                         hanfsv4_failover(nsrv4);
+         } else {
+                 /* Cold start */
+                 nsrv4->rfs4_start_time = 0;
+                 rfs4_state_zone_init(nsrv4);
+                 nsrv4->nfs4_drc = rfs4_init_drc(nfs4_drc_max,
+                     nfs4_drc_hash);
+         }
+ 
+         /* Check if delegation is to be enabled */
+         if (srv_delegation != FALSE)
+                 rfs4_set_deleg_policy(nsrv4, SRV_NORMAL_DELEGATE);
+ }
+ 
+ void
  rfs4_init_compound_state(struct compound_state *cs)
  {
          bzero(cs, sizeof (*cs));
          cs->cont = TRUE;
          cs->access = CS_ACCESS_DENIED;
          cs->deleg = FALSE;
          cs->mandlock = FALSE;
          cs->fh.nfs_fh4_val = cs->fhbuf;
+         cs->statusp = NULL;
  }
  
  void
  rfs4_grace_start(rfs4_servinst_t *sip)
  {
*** 650,687 ****
  
  /*
   * reset all currently active grace periods
   */
  void
! rfs4_grace_reset_all(void)
  {
          rfs4_servinst_t *sip;
  
!         mutex_enter(&rfs4_servinst_lock);
!         for (sip = rfs4_cur_servinst; sip != NULL; sip = sip->prev)
                  if (rfs4_servinst_in_grace(sip))
                          rfs4_grace_start(sip);
!         mutex_exit(&rfs4_servinst_lock);
  }
  
  /*
   * start any new instances' grace periods
   */
  void
! rfs4_grace_start_new(void)
  {
          rfs4_servinst_t *sip;
  
!         mutex_enter(&rfs4_servinst_lock);
!         for (sip = rfs4_cur_servinst; sip != NULL; sip = sip->prev)
                  if (rfs4_servinst_grace_new(sip))
                          rfs4_grace_start(sip);
!         mutex_exit(&rfs4_servinst_lock);
  }
  
  static rfs4_dss_path_t *
! rfs4_dss_newpath(rfs4_servinst_t *sip, char *path, unsigned index)
  {
          size_t len;
          rfs4_dss_path_t *dss_path;
  
          dss_path = kmem_alloc(sizeof (rfs4_dss_path_t), KM_SLEEP);
--- 734,772 ----
  
  /*
   * reset all currently active grace periods
   */
  void
! rfs4_grace_reset_all(nfs4_srv_t *nsrv4)
  {
          rfs4_servinst_t *sip;
  
!         mutex_enter(&nsrv4->servinst_lock);
!         for (sip = nsrv4->nfs4_cur_servinst; sip != NULL; sip = sip->prev)
                  if (rfs4_servinst_in_grace(sip))
                          rfs4_grace_start(sip);
!         mutex_exit(&nsrv4->servinst_lock);
  }
  
  /*
   * start any new instances' grace periods
   */
  void
! rfs4_grace_start_new(nfs4_srv_t *nsrv4)
  {
          rfs4_servinst_t *sip;
  
!         mutex_enter(&nsrv4->servinst_lock);
!         for (sip = nsrv4->nfs4_cur_servinst; sip != NULL; sip = sip->prev)
                  if (rfs4_servinst_grace_new(sip))
                          rfs4_grace_start(sip);
!         mutex_exit(&nsrv4->servinst_lock);
  }
  
  static rfs4_dss_path_t *
! rfs4_dss_newpath(nfs4_srv_t *nsrv4, rfs4_servinst_t *sip,
!     char *path, unsigned index)
  {
          size_t len;
          rfs4_dss_path_t *dss_path;
  
          dss_path = kmem_alloc(sizeof (rfs4_dss_path_t), KM_SLEEP);
*** 701,719 ****
  
          /*
           * Add to list of served paths.
           * No locking required, as we're only ever called at startup.
           */
!         if (rfs4_dss_pathlist == NULL) {
                  /* this is the first dss_path_t */
  
                  /* needed for insque/remque */
                  dss_path->next = dss_path->prev = dss_path;
  
!                 rfs4_dss_pathlist = dss_path;
          } else {
!                 insque(dss_path, rfs4_dss_pathlist);
          }
  
          return (dss_path);
  }
  
--- 786,804 ----
  
          /*
           * Add to list of served paths.
           * No locking required, as we're only ever called at startup.
           */
!         if (nsrv4->dss_pathlist == NULL) {
                  /* this is the first dss_path_t */
  
                  /* needed for insque/remque */
                  dss_path->next = dss_path->prev = dss_path;
  
!                 nsrv4->dss_pathlist = dss_path;
          } else {
!                 insque(dss_path, nsrv4->dss_pathlist);
          }
  
          return (dss_path);
  }
  
*** 721,731 ****
   * Create a new server instance, and make it the currently active instance.
   * Note that starting the grace period too early will reduce the clients'
   * recovery window.
   */
  void
! rfs4_servinst_create(int start_grace, int dss_npaths, char **dss_paths)
  {
          unsigned i;
          rfs4_servinst_t *sip;
          rfs4_oldstate_t *oldstate;
  
--- 806,817 ----
   * Create a new server instance, and make it the currently active instance.
   * Note that starting the grace period too early will reduce the clients'
   * recovery window.
   */
  void
! rfs4_servinst_create(nfs4_srv_t *nsrv4, int start_grace,
!     int dss_npaths, char **dss_paths)
  {
          unsigned i;
          rfs4_servinst_t *sip;
          rfs4_oldstate_t *oldstate;
  
*** 752,794 ****
          sip->dss_npaths = dss_npaths;
          sip->dss_paths = kmem_alloc(dss_npaths *
              sizeof (rfs4_dss_path_t *), KM_SLEEP);
  
          for (i = 0; i < dss_npaths; i++) {
!                 sip->dss_paths[i] = rfs4_dss_newpath(sip, dss_paths[i], i);
          }
  
!         mutex_enter(&rfs4_servinst_lock);
!         if (rfs4_cur_servinst != NULL) {
                  /* add to linked list */
!                 sip->prev = rfs4_cur_servinst;
!                 rfs4_cur_servinst->next = sip;
          }
          if (start_grace)
                  rfs4_grace_start(sip);
          /* make the new instance "current" */
!         rfs4_cur_servinst = sip;
  
!         mutex_exit(&rfs4_servinst_lock);
  }
  
  /*
   * In future, we might add a rfs4_servinst_destroy(sip) but, for now, destroy
   * all instances directly.
   */
  void
! rfs4_servinst_destroy_all(void)
  {
          rfs4_servinst_t *sip, *prev, *current;
  #ifdef DEBUG
          int n = 0;
  #endif
  
!         mutex_enter(&rfs4_servinst_lock);
!         ASSERT(rfs4_cur_servinst != NULL);
!         current = rfs4_cur_servinst;
!         rfs4_cur_servinst = NULL;
          for (sip = current; sip != NULL; sip = prev) {
                  prev = sip->prev;
                  rw_destroy(&sip->rwlock);
                  if (sip->oldstate)
                          kmem_free(sip->oldstate, sizeof (rfs4_oldstate_t));
--- 838,881 ----
          sip->dss_npaths = dss_npaths;
          sip->dss_paths = kmem_alloc(dss_npaths *
              sizeof (rfs4_dss_path_t *), KM_SLEEP);
  
          for (i = 0; i < dss_npaths; i++) {
!                 /* CSTYLED */
!                 sip->dss_paths[i] = rfs4_dss_newpath(nsrv4, sip, dss_paths[i], i);
          }
  
!         mutex_enter(&nsrv4->servinst_lock);
!         if (nsrv4->nfs4_cur_servinst != NULL) {
                  /* add to linked list */
!                 sip->prev = nsrv4->nfs4_cur_servinst;
!                 nsrv4->nfs4_cur_servinst->next = sip;
          }
          if (start_grace)
                  rfs4_grace_start(sip);
          /* make the new instance "current" */
!         nsrv4->nfs4_cur_servinst = sip;
  
!         mutex_exit(&nsrv4->servinst_lock);
  }
  
  /*
   * In future, we might add a rfs4_servinst_destroy(sip) but, for now, destroy
   * all instances directly.
   */
  void
! rfs4_servinst_destroy_all(nfs4_srv_t *nsrv4)
  {
          rfs4_servinst_t *sip, *prev, *current;
  #ifdef DEBUG
          int n = 0;
  #endif
  
!         mutex_enter(&nsrv4->servinst_lock);
!         ASSERT(nsrv4->nfs4_cur_servinst != NULL);
!         current = nsrv4->nfs4_cur_servinst;
!         nsrv4->nfs4_cur_servinst = NULL;
          for (sip = current; sip != NULL; sip = prev) {
                  prev = sip->prev;
                  rw_destroy(&sip->rwlock);
                  if (sip->oldstate)
                          kmem_free(sip->oldstate, sizeof (rfs4_oldstate_t));
*** 798,826 ****
                  kmem_free(sip, sizeof (rfs4_servinst_t));
  #ifdef DEBUG
                  n++;
  #endif
          }
!         mutex_exit(&rfs4_servinst_lock);
  }
  
  /*
   * Assign the current server instance to a client_t.
   * Should be called with cp->rc_dbe held.
   */
  void
! rfs4_servinst_assign(rfs4_client_t *cp, rfs4_servinst_t *sip)
  {
          ASSERT(rfs4_dbe_refcnt(cp->rc_dbe) > 0);
  
          /*
           * The lock ensures that if the current instance is in the process
           * of changing, we will see the new one.
           */
!         mutex_enter(&rfs4_servinst_lock);
          cp->rc_server_instance = sip;
!         mutex_exit(&rfs4_servinst_lock);
  }
  
  rfs4_servinst_t *
  rfs4_servinst(rfs4_client_t *cp)
  {
--- 885,914 ----
                  kmem_free(sip, sizeof (rfs4_servinst_t));
  #ifdef DEBUG
                  n++;
  #endif
          }
!         mutex_exit(&nsrv4->servinst_lock);
  }
  
  /*
   * Assign the current server instance to a client_t.
   * Should be called with cp->rc_dbe held.
   */
  void
! rfs4_servinst_assign(nfs4_srv_t *nsrv4, rfs4_client_t *cp,
!     rfs4_servinst_t *sip)
  {
          ASSERT(rfs4_dbe_refcnt(cp->rc_dbe) > 0);
  
          /*
           * The lock ensures that if the current instance is in the process
           * of changing, we will see the new one.
           */
!         mutex_enter(&nsrv4->servinst_lock);
          cp->rc_server_instance = sip;
!         mutex_exit(&nsrv4->servinst_lock);
  }
  
  rfs4_servinst_t *
  rfs4_servinst(rfs4_client_t *cp)
  {
*** 877,886 ****
--- 965,975 ----
          secinfo4 *resok_val;
          struct secinfo *secp;
          seconfig_t *si;
          bool_t did_traverse = FALSE;
          int dotdot, walk;
+         nfs_export_t *ne = nfs_get_export();
  
          dvp = cs->vp;
          dotdot = (nm[0] == '.' && nm[1] == '.' && nm[2] == '\0');
  
          /*
*** 898,908 ****
  
                          /*
                           * If at the system root, then can
                           * go up no further.
                           */
!                         if (VN_CMP(dvp, rootdir))
                                  return (puterrno4(ENOENT));
  
                          /*
                           * Traverse back to the mounted-on filesystem
                           */
--- 987,997 ----
  
                          /*
                           * If at the system root, then can
                           * go up no further.
                           */
!                         if (VN_CMP(dvp, ZONE_ROOTVP()))
                                  return (puterrno4(ENOENT));
  
                          /*
                           * Traverse back to the mounted-on filesystem
                           */
*** 1015,1025 ****
           *
           * Return all flavors for a pseudo node.
           * For a real export node, return the flavor that the client
           * has access with.
           */
!         ASSERT(RW_LOCK_HELD(&exported_lock));
          if (PSEUDO(exi)) {
                  count = exi->exi_export.ex_seccnt; /* total sec count */
                  resok_val = kmem_alloc(count * sizeof (secinfo4), KM_SLEEP);
                  secp = exi->exi_export.ex_secinfo;
  
--- 1104,1114 ----
           *
           * Return all flavors for a pseudo node.
           * For a real export node, return the flavor that the client
           * has access with.
           */
!         ASSERT(RW_LOCK_HELD(&ne->exported_lock));
          if (PSEUDO(exi)) {
                  count = exi->exi_export.ex_seccnt; /* total sec count */
                  resok_val = kmem_alloc(count * sizeof (secinfo4), KM_SLEEP);
                  secp = exi->exi_export.ex_secinfo;
  
*** 1378,1387 ****
--- 1467,1477 ----
          COMMIT4res *resp = &resop->nfs_resop4_u.opcommit;
          int error;
          vnode_t *vp = cs->vp;
          cred_t *cr = cs->cr;
          vattr_t va;
+         nfs4_srv_t *nsrv4;
  
          DTRACE_NFSV4_2(op__commit__start, struct compound_state *, cs,
              COMMIT4args *, args);
  
          if (vp == NULL) {
*** 1434,1445 ****
          if (error) {
                  *cs->statusp = resp->status = puterrno4(error);
                  goto out;
          }
  
          *cs->statusp = resp->status = NFS4_OK;
!         resp->writeverf = Write4verf;
  out:
          DTRACE_NFSV4_2(op__commit__done, struct compound_state *, cs,
              COMMIT4res *, resp);
  }
  
--- 1524,1536 ----
          if (error) {
                  *cs->statusp = resp->status = puterrno4(error);
                  goto out;
          }
  
+         nsrv4 = zone_getspecific(rfs4_zone_key, curzone);
          *cs->statusp = resp->status = NFS4_OK;
!         resp->writeverf = nsrv4->write4verf;
  out:
          DTRACE_NFSV4_2(op__commit__done, struct compound_state *, cs,
              COMMIT4res *, resp);
  }
  
*** 2643,2653 ****
  
                          /*
                           * If at the system root, then can
                           * go up no further.
                           */
!                         if (VN_CMP(cs->vp, rootdir))
                                  return (puterrno4(ENOENT));
  
                          /*
                           * Traverse back to the mounted-on filesystem
                           */
--- 2734,2744 ----
  
                          /*
                           * If at the system root, then can
                           * go up no further.
                           */
!                         if (VN_CMP(cs->vp, ZONE_ROOTVP()))
                                  return (puterrno4(ENOENT));
  
                          /*
                           * Traverse back to the mounted-on filesystem
                           */
*** 3407,3416 ****
--- 3498,3508 ----
          PUTPUBFH4res    *resp = &resop->nfs_resop4_u.opputpubfh;
          int             error;
          vnode_t         *vp;
          struct exportinfo *exi, *sav_exi;
          nfs_fh4_fmt_t   *fh_fmtp;
+         nfs_export_t *ne = nfs_get_export();
  
          DTRACE_NFSV4_1(op__putpubfh__start, struct compound_state *, cs);
  
          if (cs->vp) {
                  VN_RELE(cs->vp);
*** 3420,3442 ****
          if (cs->cr)
                  crfree(cs->cr);
  
          cs->cr = crdup(cs->basecr);
  
!         vp = exi_public->exi_vp;
          if (vp == NULL) {
                  *cs->statusp = resp->status = NFS4ERR_SERVERFAULT;
                  goto out;
          }
  
!         error = makefh4(&cs->fh, vp, exi_public);
          if (error != 0) {
                  *cs->statusp = resp->status = puterrno4(error);
                  goto out;
          }
          sav_exi = cs->exi;
!         if (exi_public == exi_root) {
                  /*
                   * No filesystem is actually shared public, so we default
                   * to exi_root. In this case, we must check whether root
                   * is exported.
                   */
--- 3512,3534 ----
          if (cs->cr)
                  crfree(cs->cr);
  
          cs->cr = crdup(cs->basecr);
  
!         vp = ne->exi_public->exi_vp;
          if (vp == NULL) {
                  *cs->statusp = resp->status = NFS4ERR_SERVERFAULT;
                  goto out;
          }
  
!         error = makefh4(&cs->fh, vp, ne->exi_public);
          if (error != 0) {
                  *cs->statusp = resp->status = puterrno4(error);
                  goto out;
          }
          sav_exi = cs->exi;
!         if (ne->exi_public == ne->exi_root) {
                  /*
                   * No filesystem is actually shared public, so we default
                   * to exi_root. In this case, we must check whether root
                   * is exported.
                   */
*** 3447,3462 ****
                   * should use is what checkexport4 returns, because root_exi is
                   * actually a mostly empty struct.
                   */
                  exi = checkexport4(&fh_fmtp->fh4_fsid,
                      (fid_t *)&fh_fmtp->fh4_xlen, NULL);
!                 cs->exi = ((exi != NULL) ? exi : exi_public);
          } else {
                  /*
                   * it's a properly shared filesystem
                   */
!                 cs->exi = exi_public;
          }
  
          if (is_system_labeled()) {
                  bslabel_t *clabel;
  
--- 3539,3554 ----
                   * should use is what checkexport4 returns, because root_exi is
                   * actually a mostly empty struct.
                   */
                  exi = checkexport4(&fh_fmtp->fh4_fsid,
                      (fid_t *)&fh_fmtp->fh4_xlen, NULL);
!                 cs->exi = ((exi != NULL) ? exi : ne->exi_public);
          } else {
                  /*
                   * it's a properly shared filesystem
                   */
!                 cs->exi = ne->exi_public;
          }
  
          if (is_system_labeled()) {
                  bslabel_t *clabel;
  
*** 3527,3537 ****
          if (cs->cr) {
                  crfree(cs->cr);
                  cs->cr = NULL;
          }
  
- 
          if (args->object.nfs_fh4_len < NFS_FH4_LEN) {
                  *cs->statusp = resp->status = NFS4ERR_BADHANDLE;
                  goto out;
          }
  
--- 3619,3628 ----
*** 3594,3604 ****
           * Using rootdir, the system root vnode,
           * get its fid.
           */
          bzero(&fid, sizeof (fid));
          fid.fid_len = MAXFIDSZ;
!         error = vop_fid_pseudo(rootdir, &fid);
          if (error != 0) {
                  *cs->statusp = resp->status = puterrno4(error);
                  goto out;
          }
  
--- 3685,3695 ----
           * Using rootdir, the system root vnode,
           * get its fid.
           */
          bzero(&fid, sizeof (fid));
          fid.fid_len = MAXFIDSZ;
!         error = vop_fid_pseudo(ZONE_ROOTVP(), &fid);
          if (error != 0) {
                  *cs->statusp = resp->status = puterrno4(error);
                  goto out;
          }
  
*** 3608,3618 ****
           * If the server root isn't exported directly, then
           * it should at least be a pseudo export based on
           * one or more exports further down in the server's
           * file tree.
           */
!         exi = checkexport4(&rootdir->v_vfsp->vfs_fsid, &fid, NULL);
          if (exi == NULL || exi->exi_export.ex_flags & EX_PUBLIC) {
                  NFS4_DEBUG(rfs4_debug,
                      (CE_WARN, "rfs4_op_putrootfh: export check failure"));
                  *cs->statusp = resp->status = NFS4ERR_SERVERFAULT;
                  goto out;
--- 3699,3709 ----
           * If the server root isn't exported directly, then
           * it should at least be a pseudo export based on
           * one or more exports further down in the server's
           * file tree.
           */
!         exi = checkexport4(&ZONE_ROOTVP()->v_vfsp->vfs_fsid, &fid, NULL);
          if (exi == NULL || exi->exi_export.ex_flags & EX_PUBLIC) {
                  NFS4_DEBUG(rfs4_debug,
                      (CE_WARN, "rfs4_op_putrootfh: export check failure"));
                  *cs->statusp = resp->status = NFS4ERR_SERVERFAULT;
                  goto out;
*** 3620,3643 ****
  
          /*
           * Now make a filehandle based on the root
           * export and root vnode.
           */
!         error = makefh4(&cs->fh, rootdir, exi);
          if (error != 0) {
                  *cs->statusp = resp->status = puterrno4(error);
                  goto out;
          }
  
          sav_exi = cs->exi;
          cs->exi = exi;
  
!         VN_HOLD(rootdir);
!         cs->vp = rootdir;
  
          if ((resp->status = call_checkauth4(cs, req)) != NFS4_OK) {
!                 VN_RELE(rootdir);
                  cs->vp = NULL;
                  cs->exi = sav_exi;
                  goto out;
          }
  
--- 3711,3734 ----
  
          /*
           * Now make a filehandle based on the root
           * export and root vnode.
           */
!         error = makefh4(&cs->fh, ZONE_ROOTVP(), exi);
          if (error != 0) {
                  *cs->statusp = resp->status = puterrno4(error);
                  goto out;
          }
  
          sav_exi = cs->exi;
          cs->exi = exi;
  
!         VN_HOLD(ZONE_ROOTVP());
!         cs->vp = ZONE_ROOTVP();
  
          if ((resp->status = call_checkauth4(cs, req)) != NFS4_OK) {
!                 VN_RELE(cs->vp);
                  cs->vp = NULL;
                  cs->exi = sav_exi;
                  goto out;
          }
  
*** 4244,4254 ****
                           * not ENOTEMPTY, if the directory is not
                           * empty.  A System V NFS server needs to map
                           * NFS4ERR_EXIST to NFS4ERR_NOTEMPTY to
                           * transmit over the wire.
                           */
!                         if ((error = VOP_RMDIR(dvp, name, rootdir, cs->cr,
                              NULL, 0)) == EEXIST)
                                  error = ENOTEMPTY;
                  }
          } else {
                  if ((error = VOP_REMOVE(dvp, name, cs->cr, NULL, 0)) == 0 &&
--- 4335,4345 ----
                           * not ENOTEMPTY, if the directory is not
                           * empty.  A System V NFS server needs to map
                           * NFS4ERR_EXIST to NFS4ERR_NOTEMPTY to
                           * transmit over the wire.
                           */
!                         if ((error = VOP_RMDIR(dvp, name, ZONE_ROOTVP(), cs->cr,
                              NULL, 0)) == EEXIST)
                                  error = ENOTEMPTY;
                  }
          } else {
                  if ((error = VOP_REMOVE(dvp, name, cs->cr, NULL, 0)) == 0 &&
*** 4356,4373 ****
          RENAME4args *args = &argop->nfs_argop4_u.oprename;
          RENAME4res *resp = &resop->nfs_resop4_u.oprename;
          int error;
          vnode_t *odvp;
          vnode_t *ndvp;
!         vnode_t *srcvp, *targvp;
          struct vattr obdva, oidva, oadva;
          struct vattr nbdva, nidva, nadva;
          char *onm, *nnm;
          uint_t olen, nlen;
          rfs4_file_t *fp, *sfp;
          int in_crit_src, in_crit_targ;
          int fp_rele_grant_hold, sfp_rele_grant_hold;
          bslabel_t *clabel;
          struct sockaddr *ca;
          char *converted_onm = NULL;
          char *converted_nnm = NULL;
          nfsstat4 status;
--- 4447,4465 ----
          RENAME4args *args = &argop->nfs_argop4_u.oprename;
          RENAME4res *resp = &resop->nfs_resop4_u.oprename;
          int error;
          vnode_t *odvp;
          vnode_t *ndvp;
!         vnode_t *srcvp, *targvp, *tvp;
          struct vattr obdva, oidva, oadva;
          struct vattr nbdva, nidva, nadva;
          char *onm, *nnm;
          uint_t olen, nlen;
          rfs4_file_t *fp, *sfp;
          int in_crit_src, in_crit_targ;
          int fp_rele_grant_hold, sfp_rele_grant_hold;
+         int unlinked;
          bslabel_t *clabel;
          struct sockaddr *ca;
          char *converted_onm = NULL;
          char *converted_nnm = NULL;
          nfsstat4 status;
*** 4374,4386 ****
  
          DTRACE_NFSV4_2(op__rename__start, struct compound_state *, cs,
              RENAME4args *, args);
  
          fp = sfp = NULL;
!         srcvp = targvp = NULL;
          in_crit_src = in_crit_targ = 0;
          fp_rele_grant_hold = sfp_rele_grant_hold = 0;
  
          /* CURRENT_FH: target directory */
          ndvp = cs->vp;
          if (ndvp == NULL) {
                  *cs->statusp = resp->status = NFS4ERR_NOFILEHANDLE;
--- 4466,4479 ----
  
          DTRACE_NFSV4_2(op__rename__start, struct compound_state *, cs,
              RENAME4args *, args);
  
          fp = sfp = NULL;
!         srcvp = targvp = tvp = NULL;
          in_crit_src = in_crit_targ = 0;
          fp_rele_grant_hold = sfp_rele_grant_hold = 0;
+         unlinked = 0;
  
          /* CURRENT_FH: target directory */
          ndvp = cs->vp;
          if (ndvp == NULL) {
                  *cs->statusp = resp->status = NFS4ERR_NOFILEHANDLE;
*** 4549,4559 ****
                          goto err_out;
                  }
          }
          fp_rele_grant_hold = 1;
  
- 
          /* Check for NBMAND lock on both source and target */
          if (nbl_need_check(srcvp)) {
                  nbl_start_crit(srcvp, RW_READER);
                  in_crit_src = 1;
                  if (nbl_conflict(srcvp, NBL_RENAME, 0, 0, 0, NULL)) {
--- 4642,4651 ----
*** 4584,4618 ****
          }
  
          NFS4_SET_FATTR4_CHANGE(resp->source_cinfo.before, obdva.va_ctime)
          NFS4_SET_FATTR4_CHANGE(resp->target_cinfo.before, nbdva.va_ctime)
  
!         if ((error = VOP_RENAME(odvp, converted_onm, ndvp, converted_nnm,
!             cs->cr, NULL, 0)) == 0 && fp != NULL) {
!                 struct vattr va;
!                 vnode_t *tvp;
  
                  rfs4_dbe_lock(fp->rf_dbe);
                  tvp = fp->rf_vp;
                  if (tvp)
                          VN_HOLD(tvp);
                  rfs4_dbe_unlock(fp->rf_dbe);
  
                  if (tvp) {
                          va.va_mask = AT_NLINK;
                          if (!VOP_GETATTR(tvp, &va, 0, cs->cr, NULL) &&
                              va.va_nlink == 0) {
!                                 /* The file is gone and so should the state */
!                                 if (in_crit_targ) {
!                                         nbl_end_crit(targvp);
!                                         in_crit_targ = 0;
                                  }
!                                 rfs4_close_all_state(fp);
!                         }
                          VN_RELE(tvp);
                  }
          }
          if (error == 0)
                  vn_renamepath(ndvp, srcvp, nnm, nlen - 1);
  
          if (in_crit_src)
                  nbl_end_crit(srcvp);
--- 4676,4720 ----
          }
  
          NFS4_SET_FATTR4_CHANGE(resp->source_cinfo.before, obdva.va_ctime)
          NFS4_SET_FATTR4_CHANGE(resp->target_cinfo.before, nbdva.va_ctime)
  
!         error = VOP_RENAME(odvp, converted_onm, ndvp, converted_nnm, cs->cr,
!             NULL, 0);
  
+         /*
+          * If target existed and was unlinked by VOP_RENAME, state will need
+          * closed. To avoid deadlock, rfs4_close_all_state will be done after
+          * any necessary nbl_end_crit on srcvp and tgtvp.
+          */
+         if (error == 0 && fp != NULL) {
                  rfs4_dbe_lock(fp->rf_dbe);
                  tvp = fp->rf_vp;
                  if (tvp)
                          VN_HOLD(tvp);
                  rfs4_dbe_unlock(fp->rf_dbe);
  
                  if (tvp) {
+                         struct vattr va;
                          va.va_mask = AT_NLINK;
+ 
                          if (!VOP_GETATTR(tvp, &va, 0, cs->cr, NULL) &&
                              va.va_nlink == 0) {
!                                 unlinked = 1;
! 
!                                 /* DEBUG data */
!                                 if ((srcvp == targvp) || (tvp != targvp)) {
!                                         cmn_err(CE_WARN, "rfs4_op_rename: "
!                                             "srcvp %p, targvp: %p, tvp: %p",
!                                             (void *)srcvp, (void *)targvp,
!                                             (void *)tvp);
                                  }
!                         } else {
                                  VN_RELE(tvp);
                          }
                  }
+         }
          if (error == 0)
                  vn_renamepath(ndvp, srcvp, nnm, nlen - 1);
  
          if (in_crit_src)
                  nbl_end_crit(srcvp);
*** 4621,4630 ****
--- 4723,4747 ----
          if (in_crit_targ)
                  nbl_end_crit(targvp);
          if (targvp)
                  VN_RELE(targvp);
  
+         if (unlinked) {
+                 ASSERT(fp != NULL);
+                 ASSERT(tvp != NULL);
+ 
+                 /* DEBUG data */
+                 if (RW_READ_HELD(&tvp->v_nbllock)) {
+                         cmn_err(CE_WARN, "rfs4_op_rename: "
+                             "RW_READ_HELD(%p)", (void *)tvp);
+                 }
+ 
+                 /* The file is gone and so should the state */
+                 rfs4_close_all_state(fp);
+                 VN_RELE(tvp);
+         }
+ 
          if (sfp) {
                  rfs4_clear_dont_grant(sfp);
                  rfs4_file_rele(sfp);
          }
          if (fp) {
*** 5557,5566 ****
--- 5674,5684 ----
          cred_t *savecred, *cr;
          bool_t *deleg = &cs->deleg;
          nfsstat4 stat;
          int in_crit = 0;
          caller_context_t ct;
+         nfs4_srv_t *nsrv4;
  
          DTRACE_NFSV4_2(op__write__start, struct compound_state *, cs,
              WRITE4args *, args);
  
          vp = cs->vp;
*** 5627,5641 ****
          if (MANDLOCK(vp, bva.va_mode)) {
                  *cs->statusp = resp->status = NFS4ERR_ACCESS;
                  goto out;
          }
  
          if (args->data_len == 0) {
                  *cs->statusp = resp->status = NFS4_OK;
                  resp->count = 0;
                  resp->committed = args->stable;
!                 resp->writeverf = Write4verf;
                  goto out;
          }
  
          if (args->mblk != NULL) {
                  mblk_t *m;
--- 5745,5760 ----
          if (MANDLOCK(vp, bva.va_mode)) {
                  *cs->statusp = resp->status = NFS4ERR_ACCESS;
                  goto out;
          }
  
+         nsrv4 = zone_getspecific(rfs4_zone_key, curzone);
          if (args->data_len == 0) {
                  *cs->statusp = resp->status = NFS4_OK;
                  resp->count = 0;
                  resp->committed = args->stable;
!                 resp->writeverf = nsrv4->write4verf;
                  goto out;
          }
  
          if (args->mblk != NULL) {
                  mblk_t *m;
*** 5727,5737 ****
          if (ioflag == 0)
                  resp->committed = UNSTABLE4;
          else
                  resp->committed = FILE_SYNC4;
  
!         resp->writeverf = Write4verf;
  
  out:
          if (in_crit)
                  nbl_end_crit(vp);
  
--- 5846,5856 ----
          if (ioflag == 0)
                  resp->committed = UNSTABLE4;
          else
                  resp->committed = FILE_SYNC4;
  
!         resp->writeverf = nsrv4->write4verf;
  
  out:
          if (in_crit)
                  nbl_end_crit(vp);
  
*** 5747,5756 ****
--- 5866,5877 ----
  rfs4_compound(COMPOUND4args *args, COMPOUND4res *resp, struct exportinfo *exi,
      struct svc_req *req, cred_t *cr, int *rv)
  {
          uint_t i;
          struct compound_state cs;
+         nfs4_srv_t *nsrv4;
+         nfs_export_t *ne = nfs_get_export();
  
          if (rv != NULL)
                  *rv = 0;
          rfs4_init_compound_state(&cs);
          /*
*** 5804,5813 ****
--- 5925,5935 ----
          resp->array_len = args->array_len;
          resp->array = kmem_zalloc(args->array_len * sizeof (nfs_resop4),
              KM_SLEEP);
  
          cs.basecr = cr;
+         nsrv4 = zone_getspecific(rfs4_zone_key, curzone);
  
          DTRACE_NFSV4_2(compound__start, struct compound_state *, &cs,
              COMPOUND4args *, args);
  
          /*
*** 5818,5841 ****
           * per proc (excluding public exinfo), and exi_count design
           * is sufficient to protect concurrent execution of NFS2/3
           * ops along with unexport.  This lock will be removed as
           * part of the NFSv4 phase 2 namespace redesign work.
           */
!         rw_enter(&exported_lock, RW_READER);
  
          /*
           * If this is the first compound we've seen, we need to start all
           * new instances' grace periods.
           */
!         if (rfs4_seen_first_compound == 0) {
!                 rfs4_grace_start_new();
                  /*
                   * This must be set after rfs4_grace_start_new(), otherwise
                   * another thread could proceed past here before the former
                   * is finished.
                   */
!                 rfs4_seen_first_compound = 1;
          }
  
          for (i = 0; i < args->array_len && cs.cont; i++) {
                  nfs_argop4 *argop;
                  nfs_resop4 *resop;
--- 5940,5963 ----
           * per proc (excluding public exinfo), and exi_count design
           * is sufficient to protect concurrent execution of NFS2/3
           * ops along with unexport.  This lock will be removed as
           * part of the NFSv4 phase 2 namespace redesign work.
           */
!         rw_enter(&ne->exported_lock, RW_READER);
  
          /*
           * If this is the first compound we've seen, we need to start all
           * new instances' grace periods.
           */
!         if (nsrv4->seen_first_compound == 0) {
!                 rfs4_grace_start_new(nsrv4);
                  /*
                   * This must be set after rfs4_grace_start_new(), otherwise
                   * another thread could proceed past here before the former
                   * is finished.
                   */
!                 nsrv4->seen_first_compound = 1;
          }
  
          for (i = 0; i < args->array_len && cs.cont; i++) {
                  nfs_argop4 *argop;
                  nfs_resop4 *resop;
*** 5845,5868 ****
--- 5967,6052 ----
                  resop = &resp->array[i];
                  resop->resop = argop->argop;
                  op = (uint_t)resop->resop;
  
                  if (op < rfsv4disp_cnt) {
+                         kstat_t *ksp = rfsprocio_v4_ptr[op];
+                         kstat_t *exi_ksp = NULL;
+ 
                          /*
                           * Count the individual ops here; NULL and COMPOUND
                           * are counted in common_dispatch()
                           */
                          rfsproccnt_v4_ptr[op].value.ui64++;
  
+                         if (ksp != NULL) {
+                                 mutex_enter(ksp->ks_lock);
+                                 kstat_runq_enter(KSTAT_IO_PTR(ksp));
+                                 mutex_exit(ksp->ks_lock);
+                         }
+ 
+                         switch (rfsv4disptab[op].op_type) {
+                         case NFS4_OP_CFH:
+                                 resop->exi = cs.exi;
+                                 break;
+                         case NFS4_OP_SFH:
+                                 resop->exi = cs.saved_exi;
+                                 break;
+                         default:
+                                 ASSERT(resop->exi == NULL);
+                                 break;
+                         }
+ 
+                         if (resop->exi != NULL) {
+                                 exi_ksp = NULL;
+                                 if (resop->exi->exi_kstats != NULL) {
+                                         exi_ksp = exp_kstats_v4(
+                                             resop->exi->exi_kstats, op);
+                                 }
+                                 if (exi_ksp != NULL) {
+                                         mutex_enter(exi_ksp->ks_lock);
+                                         kstat_runq_enter(KSTAT_IO_PTR(exi_ksp));
+                                         mutex_exit(exi_ksp->ks_lock);
+                                 }
+                         }
+ 
                          NFS4_DEBUG(rfs4_debug > 1,
                              (CE_NOTE, "Executing %s", rfs4_op_string[op]));
                          (*rfsv4disptab[op].dis_proc)(argop, resop, req, &cs);
                          NFS4_DEBUG(rfs4_debug > 1, (CE_NOTE, "%s returned %d",
                              rfs4_op_string[op], *cs.statusp));
                          if (*cs.statusp != NFS4_OK)
                                  cs.cont = FALSE;
+ 
+                         if (rfsv4disptab[op].op_type == NFS4_OP_POSTCFH &&
+                             *cs.statusp == NFS4_OK &&
+                             (resop->exi = cs.exi) != NULL) {
+                                 exi_ksp = NULL;
+                                 if (resop->exi->exi_kstats != NULL) {
+                                         exi_ksp = exp_kstats_v4(
+                                             resop->exi->exi_kstats, op);
+                                 }
+                         }
+ 
+                         if (exi_ksp != NULL) {
+                                 mutex_enter(exi_ksp->ks_lock);
+                                 KSTAT_IO_PTR(exi_ksp)->nwritten +=
+                                     argop->opsize;
+                                 KSTAT_IO_PTR(exi_ksp)->writes++;
+                                 if (rfsv4disptab[op].op_type != NFS4_OP_POSTCFH)
+                                         kstat_runq_exit(KSTAT_IO_PTR(exi_ksp));
+                                 mutex_exit(exi_ksp->ks_lock);
                          } else {
+                                 resop->exi = NULL;
+                         }
+ 
+                         if (ksp != NULL) {
+                                 mutex_enter(ksp->ks_lock);
+                                 kstat_runq_exit(KSTAT_IO_PTR(ksp));
+                                 mutex_exit(ksp->ks_lock);
+                         }
+                 } else {
                          /*
                           * This is effectively dead code since XDR code
                           * will have already returned BADXDR if op doesn't
                           * decode to legal value.  This only done for a
                           * day when XDR code doesn't verify v4 opcodes.
*** 5873,5907 ****
                          rfs4_op_illegal(argop, resop, req, &cs);
                          cs.cont = FALSE;
                  }
  
                  /*
                   * If not at last op, and if we are to stop, then
                   * compact the results array.
                   */
                  if ((i + 1) < args->array_len && !cs.cont) {
                          nfs_resop4 *new_res = kmem_alloc(
!                             (i+1) * sizeof (nfs_resop4), KM_SLEEP);
                          bcopy(resp->array,
!                             new_res, (i+1) * sizeof (nfs_resop4));
                          kmem_free(resp->array,
                              args->array_len * sizeof (nfs_resop4));
  
                          resp->array_len =  i + 1;
                          resp->array = new_res;
                  }
          }
  
!         rw_exit(&exported_lock);
  
!         DTRACE_NFSV4_2(compound__done, struct compound_state *, &cs,
!             COMPOUND4res *, resp);
! 
          if (cs.vp)
                  VN_RELE(cs.vp);
          if (cs.saved_vp)
                  VN_RELE(cs.saved_vp);
          if (cs.saved_fh.nfs_fh4_val)
                  kmem_free(cs.saved_fh.nfs_fh4_val, NFS4_FHSIZE);
  
          if (cs.basecr)
                  crfree(cs.basecr);
--- 6057,6106 ----
                          rfs4_op_illegal(argop, resop, req, &cs);
                          cs.cont = FALSE;
                  }
  
                  /*
+                  * The exi saved in the resop to be used for kstats update
+                  * once the opsize is calculated during XDR response encoding.
+                  * Put a hold on resop->exi so that it can't be destroyed.
+                  */
+                 if (resop->exi != NULL)
+                         exi_hold(resop->exi);
+ 
+                 /*
                   * If not at last op, and if we are to stop, then
                   * compact the results array.
                   */
                  if ((i + 1) < args->array_len && !cs.cont) {
                          nfs_resop4 *new_res = kmem_alloc(
!                             (i + 1) * sizeof (nfs_resop4), KM_SLEEP);
                          bcopy(resp->array,
!                             new_res, (i + 1) * sizeof (nfs_resop4));
                          kmem_free(resp->array,
                              args->array_len * sizeof (nfs_resop4));
  
                          resp->array_len = i + 1;
                          resp->array = new_res;
                  }
          }
  
!         rw_exit(&ne->exported_lock);
  
!         /*
!          * clear exportinfo and vnode fields from compound_state before dtrace
!          * probe, to avoid tracing residual values for path and share path.
!          */
          if (cs.vp)
                  VN_RELE(cs.vp);
          if (cs.saved_vp)
                  VN_RELE(cs.saved_vp);
+         cs.exi = cs.saved_exi = NULL;
+         cs.vp = cs.saved_vp = NULL;
+ 
+         DTRACE_NFSV4_2(compound__done, struct compound_state *, &cs,
+             COMPOUND4res *, resp);
+ 
          if (cs.saved_fh.nfs_fh4_val)
                  kmem_free(cs.saved_fh.nfs_fh4_val, NFS4_FHSIZE);
  
          if (cs.basecr)
                  crfree(cs.basecr);
*** 5967,5976 ****
--- 6166,6262 ----
                          flag = 0;
          }
          *flagp = flag;
  }
  
+ /*
+  * Update the kstats for the received requests.
+  * Note: writes/nwritten are used to hold count and nbytes of requests received.
+  *
+  * Per export request statistics need to be updated during the compound request
+  * processing (rfs4_compound()) as that is where it is known which exportinfo to
+  * associate the kstats with.
+  */
+ void
+ rfs4_compound_kstat_args(COMPOUND4args *args)
+ {
+         int i;
+ 
+         for (i = 0; i < args->array_len; i++) {
+                 uint_t op = (uint_t)args->array[i].argop;
+ 
+                 if (op < rfsv4disp_cnt) {
+                         kstat_t *ksp = rfsprocio_v4_ptr[op];
+ 
+                         if (ksp != NULL) {
+                                 mutex_enter(ksp->ks_lock);
+                                 KSTAT_IO_PTR(ksp)->nwritten +=
+                                     args->array[i].opsize;
+                                 KSTAT_IO_PTR(ksp)->writes++;
+                                 mutex_exit(ksp->ks_lock);
+                         }
+                 }
+         }
+ }
+ 
+ /*
+  * Update the kstats for the sent responses.
+  * Note: reads/nread are used to hold count and nbytes of responses sent.
+  *
+  * Per export response statistics cannot be updated until here, after the
+  * response send has generated the opsize (bytes sent) in the XDR encoding.
+  * The exportinfo with which the kstats should be associated is thus saved
+  * in the response structure (by rfs4_compound()) for use here. A hold is
+  * placed on the exi to ensure it cannot be deleted before use. This hold
+  * is released, and the exi set to NULL, here.
+  */
+ void
+ rfs4_compound_kstat_res(COMPOUND4res *res)
+ {
+         int i;
+         nfs_export_t *ne = nfs_get_export();
+ 
+         for (i = 0; i < res->array_len; i++) {
+                 uint_t op = (uint_t)res->array[i].resop;
+ 
+                 if (op < rfsv4disp_cnt) {
+                         kstat_t *ksp = rfsprocio_v4_ptr[op];
+                         struct exportinfo *exi = res->array[i].exi;
+ 
+                         if (ksp != NULL) {
+                                 mutex_enter(ksp->ks_lock);
+                                 KSTAT_IO_PTR(ksp)->nread +=
+                                     res->array[i].opsize;
+                                 KSTAT_IO_PTR(ksp)->reads++;
+                                 mutex_exit(ksp->ks_lock);
+                         }
+ 
+                         if (exi != NULL) {
+                                 kstat_t *exi_ksp = NULL;
+ 
+                                 rw_enter(&ne->exported_lock, RW_READER);
+ 
+                                 if (exi->exi_kstats != NULL) {
+                                         /*CSTYLED*/
+                                         exi_ksp = exp_kstats_v4(exi->exi_kstats, op);
+                                 }
+                                 if (exi_ksp != NULL) {
+                                         mutex_enter(exi_ksp->ks_lock);
+                                         KSTAT_IO_PTR(exi_ksp)->nread +=
+                                             res->array[i].opsize;
+                                         KSTAT_IO_PTR(exi_ksp)->reads++;
+                                         mutex_exit(exi_ksp->ks_lock);
+                                 }
+ 
+                                 exi_rele(&exi);
+                                 res->array[i].exi = NULL;
+                                 rw_exit(&ne->exported_lock);
+                         }
+                 }
+         }
+ }
+ 
  nfsstat4
  rfs4_client_sysid(rfs4_client_t *cp, sysid_t *sp)
  {
          nfsstat4 e;
  
*** 6601,6629 ****
                   */
  
                  if (trunc) {
                          int in_crit = 0;
                          rfs4_file_t *fp;
                          bool_t create = FALSE;
  
                          /*
                           * We are writing over an existing file.
                           * Check to see if we need to recall a delegation.
                           */
!                         rfs4_hold_deleg_policy();
                          if ((fp = rfs4_findfile(vp, NULL, &create)) != NULL) {
                                  if (rfs4_check_delegated_byfp(FWRITE, fp,
                                      (reqsize == 0), FALSE, FALSE, &clientid)) {
                                          rfs4_file_rele(fp);
!                                         rfs4_rele_deleg_policy();
                                          VN_RELE(vp);
                                          *attrset = 0;
                                          return (NFS4ERR_DELAY);
                                  }
                                  rfs4_file_rele(fp);
                          }
!                         rfs4_rele_deleg_policy();
  
                          if (nbl_need_check(vp)) {
                                  in_crit = 1;
  
                                  ASSERT(reqsize == 0);
--- 6887,6917 ----
                   */
  
                  if (trunc) {
                          int in_crit = 0;
                          rfs4_file_t *fp;
+                         nfs4_srv_t *nsrv4;
                          bool_t create = FALSE;
  
                          /*
                           * We are writing over an existing file.
                           * Check to see if we need to recall a delegation.
                           */
!                         nsrv4 = zone_getspecific(rfs4_zone_key, curzone);
!                         rfs4_hold_deleg_policy(nsrv4);
                          if ((fp = rfs4_findfile(vp, NULL, &create)) != NULL) {
                                  if (rfs4_check_delegated_byfp(FWRITE, fp,
                                      (reqsize == 0), FALSE, FALSE, &clientid)) {
                                          rfs4_file_rele(fp);
!                                         rfs4_rele_deleg_policy(nsrv4);
                                          VN_RELE(vp);
                                          *attrset = 0;
                                          return (NFS4ERR_DELAY);
                                  }
                                  rfs4_file_rele(fp);
                          }
!                         rfs4_rele_deleg_policy(nsrv4);
  
                          if (nbl_need_check(vp)) {
                                  in_crit = 1;
  
                                  ASSERT(reqsize == 0);
*** 8177,8191 ****
--- 8465,8481 ----
          SETCLIENTID_CONFIRM4args *args =
              &argop->nfs_argop4_u.opsetclientid_confirm;
          SETCLIENTID_CONFIRM4res *res =
              &resop->nfs_resop4_u.opsetclientid_confirm;
          rfs4_client_t *cp, *cptoclose = NULL;
+         nfs4_srv_t *nsrv4;
  
          DTRACE_NFSV4_2(op__setclientid__confirm__start,
              struct compound_state *, cs,
              SETCLIENTID_CONFIRM4args *, args);
  
+         nsrv4 = zone_getspecific(rfs4_zone_key, curzone);
          *cs->statusp = res->status = NFS4_OK;
  
          cp = rfs4_findclient_by_id(args->clientid, TRUE);
  
          if (cp == NULL) {
*** 8217,8234 ****
  
          /*
           * Update the client's associated server instance, if it's changed
           * since the client was created.
           */
!         if (rfs4_servinst(cp) != rfs4_cur_servinst)
!                 rfs4_servinst_assign(cp, rfs4_cur_servinst);
  
          /*
           * Record clientid in stable storage.
           * Must be done after server instance has been assigned.
           */
!         rfs4_ss_clid(cp);
  
          rfs4_dbe_unlock(cp->rc_dbe);
  
          if (cptoclose)
                  /* don't need to rele, client_close does it */
--- 8507,8524 ----
  
          /*
           * Update the client's associated server instance, if it's changed
           * since the client was created.
           */
!         if (rfs4_servinst(cp) != nsrv4->nfs4_cur_servinst)
!                 rfs4_servinst_assign(nsrv4, cp, nsrv4->nfs4_cur_servinst);
  
          /*
           * Record clientid in stable storage.
           * Must be done after server instance has been assigned.
           */
!         rfs4_ss_clid(nsrv4, cp);
  
          rfs4_dbe_unlock(cp->rc_dbe);
  
          if (cptoclose)
                  /* don't need to rele, client_close does it */
*** 8239,8249 ****
          rfs4_update_lease(cp);
  
          /*
           * Check to see if client can perform reclaims
           */
!         rfs4_ss_chkclid(cp);
  
          rfs4_client_rele(cp);
  
  out:
          DTRACE_NFSV4_2(op__setclientid__confirm__done,
--- 8529,8539 ----
          rfs4_update_lease(cp);
  
          /*
           * Check to see if client can perform reclaims
           */
!         rfs4_ss_chkclid(nsrv4, cp);
  
          rfs4_client_rele(cp);
  
  out:
          DTRACE_NFSV4_2(op__setclientid__confirm__done,
*** 9883,9888 ****
--- 10173,10342 ----
          if (ci == NULL)
                  return (0);
          is_downrev = ci->ri_no_referrals;
          rfs4_dbe_rele(ci->ri_dbe);
          return (is_downrev);
+ }
+ 
+ /*
+  * Do the main work of handling HA-NFSv4 Resource Group failover on
+  * Sun Cluster.
+  * We need to detect whether any RG admin paths have been added or removed,
+  * and adjust resources accordingly.
+  * Currently we're using a very inefficient algorithm, ~ 2 * O(n**2). In
+  * order to scale, the list and array of paths need to be held in more
+  * suitable data structures.
+  */
+ static void
+ hanfsv4_failover(nfs4_srv_t *nsrv4)
+ {
+         int i, start_grace, numadded_paths = 0;
+         char **added_paths = NULL;
+         rfs4_dss_path_t *dss_path;
+ 
+         /*
+          * Note: currently, dss_pathlist cannot be NULL, since
+          * it will always include an entry for NFS4_DSS_VAR_DIR. If we
+          * make the latter dynamically specified too, the following will
+          * need to be adjusted.
+          */
+ 
+         /*
+          * First, look for removed paths: RGs that have been failed-over
+          * away from this node.
+          * Walk the "currently-serving" dss_pathlist and, for each
+          * path, check if it is on the "passed-in" rfs4_dss_newpaths array
+          * from nfsd. If not, that RG path has been removed.
+          *
+          * Note that nfsd has sorted rfs4_dss_newpaths for us, and removed
+          * any duplicates.
+          */
+         dss_path = nsrv4->dss_pathlist;
+         do {
+                 int found = 0;
+                 char *path = dss_path->path;
+ 
+                 /* used only for non-HA so may not be removed */
+                 if (strcmp(path, NFS4_DSS_VAR_DIR) == 0) {
+                         dss_path = dss_path->next;
+                         continue;
+                 }
+ 
+                 for (i = 0; i < rfs4_dss_numnewpaths; i++) {
+                         int cmpret;
+                         char *newpath = rfs4_dss_newpaths[i];
+ 
+                         /*
+                          * Since nfsd has sorted rfs4_dss_newpaths for us,
+                          * once the return from strcmp is negative we know
+                          * we've passed the point where "path" should be,
+                          * and can stop searching: "path" has been removed.
+                          */
+                         cmpret = strcmp(path, newpath);
+                         if (cmpret < 0)
+                                 break;
+                         if (cmpret == 0) {
+                                 found = 1;
+                                 break;
+                         }
+                 }
+ 
+                 if (found == 0) {
+                         unsigned index = dss_path->index;
+                         rfs4_servinst_t *sip = dss_path->sip;
+                         rfs4_dss_path_t *path_next = dss_path->next;
+ 
+                         /*
+                          * This path has been removed.
+                          * We must clear out the servinst reference to
+                          * it, since it's now owned by another
+                          * node: we should not attempt to touch it.
+                          */
+                         ASSERT(dss_path == sip->dss_paths[index]);
+                         sip->dss_paths[index] = NULL;
+ 
+                         /* remove from "currently-serving" list, and destroy */
+                         remque(dss_path);
+                         /* allow for NUL */
+                         kmem_free(dss_path->path, strlen(dss_path->path) + 1);
+                         kmem_free(dss_path, sizeof (rfs4_dss_path_t));
+ 
+                         dss_path = path_next;
+                 } else {
+                         /* path was found; not removed */
+                         dss_path = dss_path->next;
+                 }
+         } while (dss_path != nsrv4->dss_pathlist);
+ 
+         /*
+          * Now, look for added paths: RGs that have been failed-over
+          * to this node.
+          * Walk the "passed-in" rfs4_dss_newpaths array from nfsd and,
+          * for each path, check if it is on the "currently-serving"
+          * dss_pathlist. If not, that RG path has been added.
+          *
+          * Note: we don't do duplicate detection here; nfsd does that for us.
+          *
+          * Note: numadded_paths <= rfs4_dss_numnewpaths, which gives us
+          * an upper bound for the size needed for added_paths[numadded_paths].
+          */
+ 
+         /* probably more space than we need, but guaranteed to be enough */
+         if (rfs4_dss_numnewpaths > 0) {
+                 size_t sz = rfs4_dss_numnewpaths * sizeof (char *);
+                 added_paths = kmem_zalloc(sz, KM_SLEEP);
+         }
+ 
+         /* walk the "passed-in" rfs4_dss_newpaths array from nfsd */
+         for (i = 0; i < rfs4_dss_numnewpaths; i++) {
+                 int found = 0;
+                 char *newpath = rfs4_dss_newpaths[i];
+ 
+                 dss_path = nsrv4->dss_pathlist;
+                 do {
+                         char *path = dss_path->path;
+ 
+                         /* used only for non-HA */
+                         if (strcmp(path, NFS4_DSS_VAR_DIR) == 0) {
+                                 dss_path = dss_path->next;
+                                 continue;
+                         }
+ 
+                         if (strncmp(path, newpath, strlen(path)) == 0) {
+                                 found = 1;
+                                 break;
+                         }
+ 
+                         dss_path = dss_path->next;
+                 } while (dss_path != nsrv4->dss_pathlist);
+ 
+                 if (found == 0) {
+                         added_paths[numadded_paths] = newpath;
+                         numadded_paths++;
+                 }
+         }
+ 
+         /* did we find any added paths? */
+         if (numadded_paths > 0) {
+ 
+                 /* create a new server instance, and start its grace period */
+                 start_grace = 1;
+                 /* CSTYLED */
+                 rfs4_servinst_create(nsrv4, start_grace, numadded_paths, added_paths);
+ 
+                 /* read in the stable storage state from these paths */
+                 rfs4_dss_readstate(nsrv4, numadded_paths, added_paths);
+ 
+                 /*
+                  * Multiple failovers during a grace period will cause
+                  * clients of the same resource group to be partitioned
+                  * into different server instances, with different
+                  * grace periods.  Since clients of the same resource
+                  * group must be subject to the same grace period,
+                  * we need to reset all currently active grace periods.
+                  */
+                 rfs4_grace_reset_all(nsrv4);
+         }
+ 
+         if (rfs4_dss_numnewpaths > 0)
+                 kmem_free(added_paths, rfs4_dss_numnewpaths * sizeof (char *));
  }