Print this page
NEX-15740 NFS deadlock in rfs4_compound with hundreds of threads waiting for lock owned by rfs4_op_rename (lint fix)
NEX-15740 NFS deadlock in rfs4_compound with hundreds of threads waiting for lock owned by rfs4_op_rename
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-16917 Need to reduce the impact of NFS per-share kstats on failover
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-16835 Kernel panic during BDD tests at rfs4_compound func
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-15924 Getting panic: BAD TRAP: type=d (#gp General protection) rp=ffffff0021464690 addr=12
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-16812 Timing window where dtrace probe could try to access share info after unshared
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-16452 NFS server in a zone state database needs to be per zone
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-15279 support NFS server in zone
NEX-15520 online NFS shares cause zoneadm halt to hang in nfs_export_zone_fini
Portions contributed by: Dan Kruchinin dan.kruchinin@nexenta.com
Portions contributed by: Stepan Zastupov stepan.zastupov@gmail.com
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
NEX-9275 Got "bad mutex" panic when run IO to nfs share from clients
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
NEX-7366 Getting panic in "module "nfssrv" due to a NULL pointer dereference" when updating NFS shares on a pool
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Steve Peng <steve.peng@nexenta.com>
NEX-6778 NFS kstats leak and cause system to hang
Revert "NEX-4261 Per-client NFS server IOPS, bandwidth, and latency kstats"
This reverts commit 586c3ab1927647487f01c337ddc011c642575a52.
Revert "NEX-5354 Aggregated IOPS, bandwidth, and latency kstats for NFS server"
This reverts commit c91d7614da8618ef48018102b077f60ecbbac8c2.
Revert "NEX-5667 nfssrv_stats_flags does not work for aggregated kstats"
This reverts commit 3dcf42618be7dd5f408c327f429c81e07ca08e74.
Revert "NEX-5750 Time values for aggregated NFS server kstats should be normalized"
This reverts commit 1f4d4f901153b0191027969fa4a8064f9d3b9ee1.
Revert "NEX-5942 Panic in rfs4_minorvers_mismatch() with NFSv4.1 client"
This reverts commit 40766417094a162f5e4cc8786c0fa0a7e5871cd9.
Revert "NEX-5752 NFS server: namespace collision in kstats"
This reverts commit ae81e668db86050da8e483264acb0cce0444a132.
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-6109 NFS client panics in nfssrv when running nfsv4-test basic_ops STC tests
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Jean McCormack <jean.mccormack@nexenta.com>
Reviewed by: Steve Peng <steve.peng@nexenta.com>
NEX-4261 Per-client NFS server IOPS, bandwidth, and latency kstats
Reviewed by: Kevin Crowe <kevin.crowe@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-5134 Deadlock between rfs4_do_lock() and rfs4_op_read()
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
NEX-3311 NFSv4: setlock() can spin forever
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
NEX-3097 IOPS, bandwidth, and latency kstats for NFS server
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-1128 NFS server: Generic uid and gid remapping for AUTH_SYS
Reviewed by: Jan Kryl <jan.kryl@nexenta.com>
OS-72 NULL pointer dereference in rfs4_op_setclientid()
Reviewed by: Dan McDonald <danmcd@nexenta.com>

*** 18,37 **** * * CDDL HEADER END */ /* - * Copyright 2016 Nexenta Systems, Inc. All rights reserved. * Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2012, 2016 by Delphix. All rights reserved. */ /* * Copyright (c) 1983,1984,1985,1986,1987,1988,1989 AT&T. * All Rights Reserved */ #include <sys/param.h> #include <sys/types.h> #include <sys/systm.h> #include <sys/cred.h> #include <sys/buf.h> --- 18,40 ---- * * CDDL HEADER END */ /* * Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved. */ /* * Copyright (c) 1983,1984,1985,1986,1987,1988,1989 AT&T. * All Rights Reserved */ + /* + * Copyright 2019 Nexenta Systems, Inc. + * Copyright (c) 2012, 2016 by Delphix. All rights reserved. + */ + #include <sys/param.h> #include <sys/types.h> #include <sys/systm.h> #include <sys/cred.h> #include <sys/buf.h>
*** 55,77 **** --- 58,83 ---- #include <sys/policy.h> #include <sys/fem.h> #include <sys/sdt.h> #include <sys/ddi.h> #include <sys/zone.h> + #include <sys/kstat.h> #include <fs/fs_reparse.h> #include <rpc/types.h> #include <rpc/auth.h> #include <rpc/rpcsec_gss.h> #include <rpc/svc.h> #include <nfs/nfs.h> + #include <nfs/nfssys.h> #include <nfs/export.h> #include <nfs/nfs_cmd.h> #include <nfs/lm.h> #include <nfs/nfs4.h> + #include <nfs/nfs4_drc.h> #include <sys/strsubr.h> #include <sys/strsun.h> #include <inet/common.h>
*** 145,164 **** * */ #define DIRENT64_TO_DIRCOUNT(dp) \ (3 * BYTES_PER_XDR_UNIT + DIRENT64_NAMELEN((dp)->d_reclen)) ! time_t rfs4_start_time; /* Initialized in rfs4_srvrinit */ static sysid_t lockt_sysid; /* dummy sysid for all LOCKT calls */ u_longlong_t nfs4_srv_caller_id; uint_t nfs4_srv_vkey = 0; - verifier4 Write4verf; - verifier4 Readdir4verf; - void rfs4_init_compound_state(struct compound_state *); static void nullfree(caddr_t); static void rfs4_op_inval(nfs_argop4 *, nfs_resop4 *, struct svc_req *, struct compound_state *); --- 151,167 ---- * */ #define DIRENT64_TO_DIRCOUNT(dp) \ (3 * BYTES_PER_XDR_UNIT + DIRENT64_NAMELEN((dp)->d_reclen)) ! zone_key_t rfs4_zone_key; static sysid_t lockt_sysid; /* dummy sysid for all LOCKT calls */ u_longlong_t nfs4_srv_caller_id; uint_t nfs4_srv_vkey = 0; void rfs4_init_compound_state(struct compound_state *); static void nullfree(caddr_t); static void rfs4_op_inval(nfs_argop4 *, nfs_resop4 *, struct svc_req *, struct compound_state *);
*** 243,257 **** struct svc_req *req, struct compound_state *); static void rfs4_op_secinfo(nfs_argop4 *, nfs_resop4 *, struct svc_req *, struct compound_state *); static void rfs4_op_secinfo_free(nfs_resop4 *); ! static nfsstat4 check_open_access(uint32_t, ! struct compound_state *, struct svc_req *); nfsstat4 rfs4_client_sysid(rfs4_client_t *, sysid_t *); ! void rfs4_ss_clid(rfs4_client_t *); /* * translation table for attrs */ struct nfs4_ntov_table { union nfs4_attr_u *na; --- 246,261 ---- struct svc_req *req, struct compound_state *); static void rfs4_op_secinfo(nfs_argop4 *, nfs_resop4 *, struct svc_req *, struct compound_state *); static void rfs4_op_secinfo_free(nfs_resop4 *); ! static nfsstat4 check_open_access(uint32_t, struct compound_state *, ! struct svc_req *); nfsstat4 rfs4_client_sysid(rfs4_client_t *, sysid_t *); ! void rfs4_ss_clid(nfs4_srv_t *, rfs4_client_t *); + /* * translation table for attrs */ struct nfs4_ntov_table { union nfs4_attr_u *na;
*** 266,416 **** static nfsstat4 do_rfs4_set_attrs(bitmap4 *resp, fattr4 *fattrp, struct compound_state *cs, struct nfs4_svgetit_arg *sargp, struct nfs4_ntov_table *ntovp, nfs4_attr_cmd_t cmd); fem_t *deleg_rdops; fem_t *deleg_wrops; - rfs4_servinst_t *rfs4_cur_servinst = NULL; /* current server instance */ - kmutex_t rfs4_servinst_lock; /* protects linked list */ - int rfs4_seen_first_compound; /* set first time we see one */ - /* * NFS4 op dispatch table */ struct rfsv4disp { void (*dis_proc)(); /* proc to call */ void (*dis_resfree)(); /* frees space allocated by proc */ int dis_flags; /* RPC_IDEMPOTENT, etc... */ }; static struct rfsv4disp rfsv4disptab[] = { /* * NFS VERSION 4 */ /* RFS_NULL = 0 */ ! {rfs4_op_illegal, nullfree, 0}, /* UNUSED = 1 */ ! {rfs4_op_illegal, nullfree, 0}, /* UNUSED = 2 */ ! {rfs4_op_illegal, nullfree, 0}, /* OP_ACCESS = 3 */ ! {rfs4_op_access, nullfree, RPC_IDEMPOTENT}, /* OP_CLOSE = 4 */ ! {rfs4_op_close, nullfree, 0}, /* OP_COMMIT = 5 */ ! {rfs4_op_commit, nullfree, RPC_IDEMPOTENT}, /* OP_CREATE = 6 */ ! {rfs4_op_create, nullfree, 0}, /* OP_DELEGPURGE = 7 */ ! {rfs4_op_delegpurge, nullfree, 0}, /* OP_DELEGRETURN = 8 */ ! {rfs4_op_delegreturn, nullfree, 0}, /* OP_GETATTR = 9 */ ! {rfs4_op_getattr, rfs4_op_getattr_free, RPC_IDEMPOTENT}, /* OP_GETFH = 10 */ ! {rfs4_op_getfh, rfs4_op_getfh_free, RPC_ALL}, /* OP_LINK = 11 */ ! {rfs4_op_link, nullfree, 0}, /* OP_LOCK = 12 */ ! {rfs4_op_lock, lock_denied_free, 0}, /* OP_LOCKT = 13 */ ! {rfs4_op_lockt, lock_denied_free, 0}, /* OP_LOCKU = 14 */ ! {rfs4_op_locku, nullfree, 0}, /* OP_LOOKUP = 15 */ ! {rfs4_op_lookup, nullfree, (RPC_IDEMPOTENT | RPC_PUBLICFH_OK)}, /* OP_LOOKUPP = 16 */ ! {rfs4_op_lookupp, nullfree, (RPC_IDEMPOTENT | RPC_PUBLICFH_OK)}, /* OP_NVERIFY = 17 */ ! {rfs4_op_nverify, nullfree, RPC_IDEMPOTENT}, /* OP_OPEN = 18 */ ! {rfs4_op_open, rfs4_free_reply, 0}, /* OP_OPENATTR = 19 */ ! {rfs4_op_openattr, nullfree, 0}, /* OP_OPEN_CONFIRM = 20 */ ! {rfs4_op_open_confirm, nullfree, 0}, /* OP_OPEN_DOWNGRADE = 21 */ ! {rfs4_op_open_downgrade, nullfree, 0}, /* OP_OPEN_PUTFH = 22 */ ! {rfs4_op_putfh, nullfree, RPC_ALL}, /* OP_PUTPUBFH = 23 */ ! {rfs4_op_putpubfh, nullfree, RPC_ALL}, /* OP_PUTROOTFH = 24 */ ! {rfs4_op_putrootfh, nullfree, RPC_ALL}, /* OP_READ = 25 */ ! {rfs4_op_read, rfs4_op_read_free, RPC_IDEMPOTENT}, /* OP_READDIR = 26 */ ! {rfs4_op_readdir, rfs4_op_readdir_free, RPC_IDEMPOTENT}, /* OP_READLINK = 27 */ ! {rfs4_op_readlink, rfs4_op_readlink_free, RPC_IDEMPOTENT}, /* OP_REMOVE = 28 */ ! {rfs4_op_remove, nullfree, 0}, /* OP_RENAME = 29 */ ! {rfs4_op_rename, nullfree, 0}, /* OP_RENEW = 30 */ ! {rfs4_op_renew, nullfree, 0}, /* OP_RESTOREFH = 31 */ ! {rfs4_op_restorefh, nullfree, RPC_ALL}, /* OP_SAVEFH = 32 */ ! {rfs4_op_savefh, nullfree, RPC_ALL}, /* OP_SECINFO = 33 */ ! {rfs4_op_secinfo, rfs4_op_secinfo_free, 0}, /* OP_SETATTR = 34 */ ! {rfs4_op_setattr, nullfree, 0}, /* OP_SETCLIENTID = 35 */ ! {rfs4_op_setclientid, nullfree, 0}, /* OP_SETCLIENTID_CONFIRM = 36 */ ! {rfs4_op_setclientid_confirm, nullfree, 0}, /* OP_VERIFY = 37 */ ! {rfs4_op_verify, nullfree, RPC_IDEMPOTENT}, /* OP_WRITE = 38 */ ! {rfs4_op_write, nullfree, 0}, /* OP_RELEASE_LOCKOWNER = 39 */ ! {rfs4_op_release_lockowner, nullfree, 0}, }; static uint_t rfsv4disp_cnt = sizeof (rfsv4disptab) / sizeof (rfsv4disptab[0]); #define OP_ILLEGAL_IDX (rfsv4disp_cnt) --- 270,452 ---- static nfsstat4 do_rfs4_set_attrs(bitmap4 *resp, fattr4 *fattrp, struct compound_state *cs, struct nfs4_svgetit_arg *sargp, struct nfs4_ntov_table *ntovp, nfs4_attr_cmd_t cmd); + static void hanfsv4_failover(nfs4_srv_t *); + fem_t *deleg_rdops; fem_t *deleg_wrops; /* * NFS4 op dispatch table */ struct rfsv4disp { void (*dis_proc)(); /* proc to call */ void (*dis_resfree)(); /* frees space allocated by proc */ int dis_flags; /* RPC_IDEMPOTENT, etc... */ + int op_type; /* operation type, see below */ }; + /* + * operation types; used primarily for the per-exportinfo kstat implementation + */ + #define NFS4_OP_NOFH 0 /* The operation does not operate with any */ + /* particular filehandle; we cannot associate */ + /* it with any exportinfo. */ + + #define NFS4_OP_CFH 1 /* The operation works with the current */ + /* filehandle; we associate the operation */ + /* with the exportinfo related to the current */ + /* filehandle (as set before the operation is */ + /* executed). */ + + #define NFS4_OP_SFH 2 /* The operation works with the saved */ + /* filehandle; we associate the operation */ + /* with the exportinfo related to the saved */ + /* filehandle (as set before the operation is */ + /* executed). */ + + #define NFS4_OP_POSTCFH 3 /* The operation ignores the current */ + /* filehandle, but sets the new current */ + /* filehandle instead; we associate the */ + /* operation with the exportinfo related to */ + /* the current filehandle as set after the */ + /* operation is successfuly executed. Since */ + /* we do not know the particular exportinfo */ + /* (and thus the kstat) before the operation */ + /* is done, there is no simple way how to */ + /* update some I/O kstat statistics related */ + /* to kstat_queue(9F). */ + static struct rfsv4disp rfsv4disptab[] = { /* * NFS VERSION 4 */ /* RFS_NULL = 0 */ ! {rfs4_op_illegal, nullfree, 0, NFS4_OP_NOFH}, /* UNUSED = 1 */ ! {rfs4_op_illegal, nullfree, 0, NFS4_OP_NOFH}, /* UNUSED = 2 */ ! {rfs4_op_illegal, nullfree, 0, NFS4_OP_NOFH}, /* OP_ACCESS = 3 */ ! {rfs4_op_access, nullfree, RPC_IDEMPOTENT, NFS4_OP_CFH}, /* OP_CLOSE = 4 */ ! {rfs4_op_close, nullfree, 0, NFS4_OP_CFH}, /* OP_COMMIT = 5 */ ! {rfs4_op_commit, nullfree, RPC_IDEMPOTENT, NFS4_OP_CFH}, /* OP_CREATE = 6 */ ! {rfs4_op_create, nullfree, 0, NFS4_OP_CFH}, /* OP_DELEGPURGE = 7 */ ! {rfs4_op_delegpurge, nullfree, 0, NFS4_OP_NOFH}, /* OP_DELEGRETURN = 8 */ ! {rfs4_op_delegreturn, nullfree, 0, NFS4_OP_CFH}, /* OP_GETATTR = 9 */ ! {rfs4_op_getattr, rfs4_op_getattr_free, RPC_IDEMPOTENT, NFS4_OP_CFH}, /* OP_GETFH = 10 */ ! {rfs4_op_getfh, rfs4_op_getfh_free, RPC_ALL, NFS4_OP_CFH}, /* OP_LINK = 11 */ ! {rfs4_op_link, nullfree, 0, NFS4_OP_CFH}, /* OP_LOCK = 12 */ ! {rfs4_op_lock, lock_denied_free, 0, NFS4_OP_CFH}, /* OP_LOCKT = 13 */ ! {rfs4_op_lockt, lock_denied_free, 0, NFS4_OP_CFH}, /* OP_LOCKU = 14 */ ! {rfs4_op_locku, nullfree, 0, NFS4_OP_CFH}, /* OP_LOOKUP = 15 */ ! {rfs4_op_lookup, nullfree, (RPC_IDEMPOTENT | RPC_PUBLICFH_OK), ! NFS4_OP_CFH}, /* OP_LOOKUPP = 16 */ ! {rfs4_op_lookupp, nullfree, (RPC_IDEMPOTENT | RPC_PUBLICFH_OK), ! NFS4_OP_CFH}, /* OP_NVERIFY = 17 */ ! {rfs4_op_nverify, nullfree, RPC_IDEMPOTENT, NFS4_OP_CFH}, /* OP_OPEN = 18 */ ! {rfs4_op_open, rfs4_free_reply, 0, NFS4_OP_CFH}, /* OP_OPENATTR = 19 */ ! {rfs4_op_openattr, nullfree, 0, NFS4_OP_CFH}, /* OP_OPEN_CONFIRM = 20 */ ! {rfs4_op_open_confirm, nullfree, 0, NFS4_OP_CFH}, /* OP_OPEN_DOWNGRADE = 21 */ ! {rfs4_op_open_downgrade, nullfree, 0, NFS4_OP_CFH}, /* OP_OPEN_PUTFH = 22 */ ! {rfs4_op_putfh, nullfree, RPC_ALL, NFS4_OP_POSTCFH}, /* OP_PUTPUBFH = 23 */ ! {rfs4_op_putpubfh, nullfree, RPC_ALL, NFS4_OP_POSTCFH}, /* OP_PUTROOTFH = 24 */ ! {rfs4_op_putrootfh, nullfree, RPC_ALL, NFS4_OP_POSTCFH}, /* OP_READ = 25 */ ! {rfs4_op_read, rfs4_op_read_free, RPC_IDEMPOTENT, NFS4_OP_CFH}, /* OP_READDIR = 26 */ ! {rfs4_op_readdir, rfs4_op_readdir_free, RPC_IDEMPOTENT, NFS4_OP_CFH}, /* OP_READLINK = 27 */ ! {rfs4_op_readlink, rfs4_op_readlink_free, RPC_IDEMPOTENT, NFS4_OP_CFH}, /* OP_REMOVE = 28 */ ! {rfs4_op_remove, nullfree, 0, NFS4_OP_CFH}, /* OP_RENAME = 29 */ ! {rfs4_op_rename, nullfree, 0, NFS4_OP_CFH}, /* OP_RENEW = 30 */ ! {rfs4_op_renew, nullfree, 0, NFS4_OP_NOFH}, /* OP_RESTOREFH = 31 */ ! {rfs4_op_restorefh, nullfree, RPC_ALL, NFS4_OP_SFH}, /* OP_SAVEFH = 32 */ ! {rfs4_op_savefh, nullfree, RPC_ALL, NFS4_OP_CFH}, /* OP_SECINFO = 33 */ ! {rfs4_op_secinfo, rfs4_op_secinfo_free, 0, NFS4_OP_CFH}, /* OP_SETATTR = 34 */ ! {rfs4_op_setattr, nullfree, 0, NFS4_OP_CFH}, /* OP_SETCLIENTID = 35 */ ! {rfs4_op_setclientid, nullfree, 0, NFS4_OP_NOFH}, /* OP_SETCLIENTID_CONFIRM = 36 */ ! {rfs4_op_setclientid_confirm, nullfree, 0, NFS4_OP_NOFH}, /* OP_VERIFY = 37 */ ! {rfs4_op_verify, nullfree, RPC_IDEMPOTENT, NFS4_OP_CFH}, /* OP_WRITE = 38 */ ! {rfs4_op_write, nullfree, 0, NFS4_OP_CFH}, /* OP_RELEASE_LOCKOWNER = 39 */ ! {rfs4_op_release_lockowner, nullfree, 0, NFS4_OP_NOFH}, }; static uint_t rfsv4disp_cnt = sizeof (rfsv4disptab) / sizeof (rfsv4disptab[0]); #define OP_ILLEGAL_IDX (rfsv4disp_cnt)
*** 464,474 **** "rfs4_op_release_lockowner", "rfs4_op_illegal" }; #endif ! void rfs4_ss_chkclid(rfs4_client_t *); extern size_t strlcpy(char *dst, const char *src, size_t dstsize); extern void rfs4_free_fs_locations4(fs_locations4 *); --- 500,510 ---- "rfs4_op_release_lockowner", "rfs4_op_illegal" }; #endif ! void rfs4_ss_chkclid(nfs4_srv_t *, rfs4_client_t *); extern size_t strlcpy(char *dst, const char *src, size_t dstsize); extern void rfs4_free_fs_locations4(fs_locations4 *);
*** 497,514 **** VOPNAME_SETSECATTR, { .femop_setsecattr = deleg_wr_setsecattr }, VOPNAME_VNEVENT, { .femop_vnevent = deleg_wr_vnevent }, NULL, NULL }; ! int ! rfs4_srvrinit(void) { timespec32_t verf; - int error; - extern void rfs4_attr_init(); - extern krwlock_t rfs4_deleg_policy_lock; /* * The following algorithm attempts to find a unique verifier * to be used as the write verifier returned from the server * to the client. It is important that this verifier change * whenever the server reboots. Of secondary importance, it --- 533,551 ---- VOPNAME_SETSECATTR, { .femop_setsecattr = deleg_wr_setsecattr }, VOPNAME_VNEVENT, { .femop_vnevent = deleg_wr_vnevent }, NULL, NULL }; ! /* ARGSUSED */ ! static void * ! rfs4_zone_init(zoneid_t zoneid) { + nfs4_srv_t *nsrv4; timespec32_t verf; + nsrv4 = kmem_zalloc(sizeof (*nsrv4), KM_SLEEP); + /* * The following algorithm attempts to find a unique verifier * to be used as the write verifier returned from the server * to the client. It is important that this verifier change * whenever the server reboots. Of secondary importance, it
*** 533,605 **** gethrestime(&tverf); verf.tv_sec = (time_t)tverf.tv_sec; verf.tv_nsec = tverf.tv_nsec; } ! Write4verf = *(uint64_t *)&verf; ! rfs4_attr_init(); ! mutex_init(&rfs4_deleg_lock, NULL, MUTEX_DEFAULT, NULL); ! /* Used to manage create/destroy of server state */ ! mutex_init(&rfs4_state_lock, NULL, MUTEX_DEFAULT, NULL); ! /* Used to manage access to server instance linked list */ ! mutex_init(&rfs4_servinst_lock, NULL, MUTEX_DEFAULT, NULL); ! /* Used to manage access to rfs4_deleg_policy */ ! rw_init(&rfs4_deleg_policy_lock, NULL, RW_DEFAULT, NULL); ! error = fem_create("deleg_rdops", nfs4_rd_deleg_tmpl, &deleg_rdops); ! if (error != 0) { rfs4_disable_delegation(); ! } else { ! error = fem_create("deleg_wrops", nfs4_wr_deleg_tmpl, ! &deleg_wrops); ! if (error != 0) { rfs4_disable_delegation(); fem_free(deleg_rdops); } - } nfs4_srv_caller_id = fs_new_caller_id(); - lockt_sysid = lm_alloc_sysidt(); - vsd_create(&nfs4_srv_vkey, NULL); ! ! return (0); } void rfs4_srvrfini(void) { - extern krwlock_t rfs4_deleg_policy_lock; - if (lockt_sysid != LM_NOSYSID) { lm_free_sysidt(lockt_sysid); lockt_sysid = LM_NOSYSID; } ! mutex_destroy(&rfs4_deleg_lock); ! mutex_destroy(&rfs4_state_lock); ! rw_destroy(&rfs4_deleg_policy_lock); fem_free(deleg_rdops); fem_free(deleg_wrops); } void rfs4_init_compound_state(struct compound_state *cs) { bzero(cs, sizeof (*cs)); cs->cont = TRUE; cs->access = CS_ACCESS_DENIED; cs->deleg = FALSE; cs->mandlock = FALSE; cs->fh.nfs_fh4_val = cs->fhbuf; } void rfs4_grace_start(rfs4_servinst_t *sip) { --- 570,689 ---- gethrestime(&tverf); verf.tv_sec = (time_t)tverf.tv_sec; verf.tv_nsec = tverf.tv_nsec; } + nsrv4->write4verf = *(uint64_t *)&verf; ! /* Used to manage create/destroy of server state */ ! nsrv4->nfs4_server_state = NULL; ! nsrv4->nfs4_cur_servinst = NULL; ! nsrv4->nfs4_deleg_policy = SRV_NEVER_DELEGATE; ! mutex_init(&nsrv4->deleg_lock, NULL, MUTEX_DEFAULT, NULL); ! mutex_init(&nsrv4->state_lock, NULL, MUTEX_DEFAULT, NULL); ! mutex_init(&nsrv4->servinst_lock, NULL, MUTEX_DEFAULT, NULL); ! rw_init(&nsrv4->deleg_policy_lock, NULL, RW_DEFAULT, NULL); ! return (nsrv4); ! } ! /* ARGSUSED */ ! static void ! rfs4_zone_fini(zoneid_t zoneid, void *data) ! { ! nfs4_srv_t *nsrv4 = data; ! mutex_destroy(&nsrv4->deleg_lock); ! mutex_destroy(&nsrv4->state_lock); ! mutex_destroy(&nsrv4->servinst_lock); ! rw_destroy(&nsrv4->deleg_policy_lock); ! kmem_free(nsrv4, sizeof (*nsrv4)); ! } ! void ! rfs4_srvrinit(void) ! { ! extern void rfs4_attr_init(); ! ! zone_key_create(&rfs4_zone_key, rfs4_zone_init, NULL, rfs4_zone_fini); ! ! rfs4_attr_init(); ! ! ! if (fem_create("deleg_rdops", nfs4_rd_deleg_tmpl, &deleg_rdops) != 0) { rfs4_disable_delegation(); ! } else if (fem_create("deleg_wrops", nfs4_wr_deleg_tmpl, ! &deleg_wrops) != 0) { rfs4_disable_delegation(); fem_free(deleg_rdops); } nfs4_srv_caller_id = fs_new_caller_id(); lockt_sysid = lm_alloc_sysidt(); vsd_create(&nfs4_srv_vkey, NULL); ! rfs4_state_g_init(); } void rfs4_srvrfini(void) { if (lockt_sysid != LM_NOSYSID) { lm_free_sysidt(lockt_sysid); lockt_sysid = LM_NOSYSID; } ! rfs4_state_g_fini(); fem_free(deleg_rdops); fem_free(deleg_wrops); + + (void) zone_key_delete(rfs4_zone_key); } void + rfs4_do_server_start(int server_upordown, + int srv_delegation, int cluster_booted) + { + nfs4_srv_t *nsrv4 = zone_getspecific(rfs4_zone_key, curzone); + + /* Is this a warm start? */ + if (server_upordown == NFS_SERVER_QUIESCED) { + cmn_err(CE_NOTE, "nfs4_srv: " + "server was previously quiesced; " + "existing NFSv4 state will be re-used"); + + /* + * HA-NFSv4: this is also the signal + * that a Resource Group failover has + * occurred. + */ + if (cluster_booted) + hanfsv4_failover(nsrv4); + } else { + /* Cold start */ + nsrv4->rfs4_start_time = 0; + rfs4_state_zone_init(nsrv4); + nsrv4->nfs4_drc = rfs4_init_drc(nfs4_drc_max, + nfs4_drc_hash); + } + + /* Check if delegation is to be enabled */ + if (srv_delegation != FALSE) + rfs4_set_deleg_policy(nsrv4, SRV_NORMAL_DELEGATE); + } + + void rfs4_init_compound_state(struct compound_state *cs) { bzero(cs, sizeof (*cs)); cs->cont = TRUE; cs->access = CS_ACCESS_DENIED; cs->deleg = FALSE; cs->mandlock = FALSE; cs->fh.nfs_fh4_val = cs->fhbuf; + cs->statusp = NULL; } void rfs4_grace_start(rfs4_servinst_t *sip) {
*** 650,687 **** /* * reset all currently active grace periods */ void ! rfs4_grace_reset_all(void) { rfs4_servinst_t *sip; ! mutex_enter(&rfs4_servinst_lock); ! for (sip = rfs4_cur_servinst; sip != NULL; sip = sip->prev) if (rfs4_servinst_in_grace(sip)) rfs4_grace_start(sip); ! mutex_exit(&rfs4_servinst_lock); } /* * start any new instances' grace periods */ void ! rfs4_grace_start_new(void) { rfs4_servinst_t *sip; ! mutex_enter(&rfs4_servinst_lock); ! for (sip = rfs4_cur_servinst; sip != NULL; sip = sip->prev) if (rfs4_servinst_grace_new(sip)) rfs4_grace_start(sip); ! mutex_exit(&rfs4_servinst_lock); } static rfs4_dss_path_t * ! rfs4_dss_newpath(rfs4_servinst_t *sip, char *path, unsigned index) { size_t len; rfs4_dss_path_t *dss_path; dss_path = kmem_alloc(sizeof (rfs4_dss_path_t), KM_SLEEP); --- 734,772 ---- /* * reset all currently active grace periods */ void ! rfs4_grace_reset_all(nfs4_srv_t *nsrv4) { rfs4_servinst_t *sip; ! mutex_enter(&nsrv4->servinst_lock); ! for (sip = nsrv4->nfs4_cur_servinst; sip != NULL; sip = sip->prev) if (rfs4_servinst_in_grace(sip)) rfs4_grace_start(sip); ! mutex_exit(&nsrv4->servinst_lock); } /* * start any new instances' grace periods */ void ! rfs4_grace_start_new(nfs4_srv_t *nsrv4) { rfs4_servinst_t *sip; ! mutex_enter(&nsrv4->servinst_lock); ! for (sip = nsrv4->nfs4_cur_servinst; sip != NULL; sip = sip->prev) if (rfs4_servinst_grace_new(sip)) rfs4_grace_start(sip); ! mutex_exit(&nsrv4->servinst_lock); } static rfs4_dss_path_t * ! rfs4_dss_newpath(nfs4_srv_t *nsrv4, rfs4_servinst_t *sip, ! char *path, unsigned index) { size_t len; rfs4_dss_path_t *dss_path; dss_path = kmem_alloc(sizeof (rfs4_dss_path_t), KM_SLEEP);
*** 701,719 **** /* * Add to list of served paths. * No locking required, as we're only ever called at startup. */ ! if (rfs4_dss_pathlist == NULL) { /* this is the first dss_path_t */ /* needed for insque/remque */ dss_path->next = dss_path->prev = dss_path; ! rfs4_dss_pathlist = dss_path; } else { ! insque(dss_path, rfs4_dss_pathlist); } return (dss_path); } --- 786,804 ---- /* * Add to list of served paths. * No locking required, as we're only ever called at startup. */ ! if (nsrv4->dss_pathlist == NULL) { /* this is the first dss_path_t */ /* needed for insque/remque */ dss_path->next = dss_path->prev = dss_path; ! nsrv4->dss_pathlist = dss_path; } else { ! insque(dss_path, nsrv4->dss_pathlist); } return (dss_path); }
*** 721,731 **** * Create a new server instance, and make it the currently active instance. * Note that starting the grace period too early will reduce the clients' * recovery window. */ void ! rfs4_servinst_create(int start_grace, int dss_npaths, char **dss_paths) { unsigned i; rfs4_servinst_t *sip; rfs4_oldstate_t *oldstate; --- 806,817 ---- * Create a new server instance, and make it the currently active instance. * Note that starting the grace period too early will reduce the clients' * recovery window. */ void ! rfs4_servinst_create(nfs4_srv_t *nsrv4, int start_grace, ! int dss_npaths, char **dss_paths) { unsigned i; rfs4_servinst_t *sip; rfs4_oldstate_t *oldstate;
*** 752,794 **** sip->dss_npaths = dss_npaths; sip->dss_paths = kmem_alloc(dss_npaths * sizeof (rfs4_dss_path_t *), KM_SLEEP); for (i = 0; i < dss_npaths; i++) { ! sip->dss_paths[i] = rfs4_dss_newpath(sip, dss_paths[i], i); } ! mutex_enter(&rfs4_servinst_lock); ! if (rfs4_cur_servinst != NULL) { /* add to linked list */ ! sip->prev = rfs4_cur_servinst; ! rfs4_cur_servinst->next = sip; } if (start_grace) rfs4_grace_start(sip); /* make the new instance "current" */ ! rfs4_cur_servinst = sip; ! mutex_exit(&rfs4_servinst_lock); } /* * In future, we might add a rfs4_servinst_destroy(sip) but, for now, destroy * all instances directly. */ void ! rfs4_servinst_destroy_all(void) { rfs4_servinst_t *sip, *prev, *current; #ifdef DEBUG int n = 0; #endif ! mutex_enter(&rfs4_servinst_lock); ! ASSERT(rfs4_cur_servinst != NULL); ! current = rfs4_cur_servinst; ! rfs4_cur_servinst = NULL; for (sip = current; sip != NULL; sip = prev) { prev = sip->prev; rw_destroy(&sip->rwlock); if (sip->oldstate) kmem_free(sip->oldstate, sizeof (rfs4_oldstate_t)); --- 838,881 ---- sip->dss_npaths = dss_npaths; sip->dss_paths = kmem_alloc(dss_npaths * sizeof (rfs4_dss_path_t *), KM_SLEEP); for (i = 0; i < dss_npaths; i++) { ! /* CSTYLED */ ! sip->dss_paths[i] = rfs4_dss_newpath(nsrv4, sip, dss_paths[i], i); } ! mutex_enter(&nsrv4->servinst_lock); ! if (nsrv4->nfs4_cur_servinst != NULL) { /* add to linked list */ ! sip->prev = nsrv4->nfs4_cur_servinst; ! nsrv4->nfs4_cur_servinst->next = sip; } if (start_grace) rfs4_grace_start(sip); /* make the new instance "current" */ ! nsrv4->nfs4_cur_servinst = sip; ! mutex_exit(&nsrv4->servinst_lock); } /* * In future, we might add a rfs4_servinst_destroy(sip) but, for now, destroy * all instances directly. */ void ! rfs4_servinst_destroy_all(nfs4_srv_t *nsrv4) { rfs4_servinst_t *sip, *prev, *current; #ifdef DEBUG int n = 0; #endif ! mutex_enter(&nsrv4->servinst_lock); ! ASSERT(nsrv4->nfs4_cur_servinst != NULL); ! current = nsrv4->nfs4_cur_servinst; ! nsrv4->nfs4_cur_servinst = NULL; for (sip = current; sip != NULL; sip = prev) { prev = sip->prev; rw_destroy(&sip->rwlock); if (sip->oldstate) kmem_free(sip->oldstate, sizeof (rfs4_oldstate_t));
*** 798,826 **** kmem_free(sip, sizeof (rfs4_servinst_t)); #ifdef DEBUG n++; #endif } ! mutex_exit(&rfs4_servinst_lock); } /* * Assign the current server instance to a client_t. * Should be called with cp->rc_dbe held. */ void ! rfs4_servinst_assign(rfs4_client_t *cp, rfs4_servinst_t *sip) { ASSERT(rfs4_dbe_refcnt(cp->rc_dbe) > 0); /* * The lock ensures that if the current instance is in the process * of changing, we will see the new one. */ ! mutex_enter(&rfs4_servinst_lock); cp->rc_server_instance = sip; ! mutex_exit(&rfs4_servinst_lock); } rfs4_servinst_t * rfs4_servinst(rfs4_client_t *cp) { --- 885,914 ---- kmem_free(sip, sizeof (rfs4_servinst_t)); #ifdef DEBUG n++; #endif } ! mutex_exit(&nsrv4->servinst_lock); } /* * Assign the current server instance to a client_t. * Should be called with cp->rc_dbe held. */ void ! rfs4_servinst_assign(nfs4_srv_t *nsrv4, rfs4_client_t *cp, ! rfs4_servinst_t *sip) { ASSERT(rfs4_dbe_refcnt(cp->rc_dbe) > 0); /* * The lock ensures that if the current instance is in the process * of changing, we will see the new one. */ ! mutex_enter(&nsrv4->servinst_lock); cp->rc_server_instance = sip; ! mutex_exit(&nsrv4->servinst_lock); } rfs4_servinst_t * rfs4_servinst(rfs4_client_t *cp) {
*** 877,886 **** --- 965,975 ---- secinfo4 *resok_val; struct secinfo *secp; seconfig_t *si; bool_t did_traverse = FALSE; int dotdot, walk; + nfs_export_t *ne = nfs_get_export(); dvp = cs->vp; dotdot = (nm[0] == '.' && nm[1] == '.' && nm[2] == '\0'); /*
*** 898,908 **** /* * If at the system root, then can * go up no further. */ ! if (VN_CMP(dvp, rootdir)) return (puterrno4(ENOENT)); /* * Traverse back to the mounted-on filesystem */ --- 987,997 ---- /* * If at the system root, then can * go up no further. */ ! if (VN_CMP(dvp, ZONE_ROOTVP())) return (puterrno4(ENOENT)); /* * Traverse back to the mounted-on filesystem */
*** 1015,1025 **** * * Return all flavors for a pseudo node. * For a real export node, return the flavor that the client * has access with. */ ! ASSERT(RW_LOCK_HELD(&exported_lock)); if (PSEUDO(exi)) { count = exi->exi_export.ex_seccnt; /* total sec count */ resok_val = kmem_alloc(count * sizeof (secinfo4), KM_SLEEP); secp = exi->exi_export.ex_secinfo; --- 1104,1114 ---- * * Return all flavors for a pseudo node. * For a real export node, return the flavor that the client * has access with. */ ! ASSERT(RW_LOCK_HELD(&ne->exported_lock)); if (PSEUDO(exi)) { count = exi->exi_export.ex_seccnt; /* total sec count */ resok_val = kmem_alloc(count * sizeof (secinfo4), KM_SLEEP); secp = exi->exi_export.ex_secinfo;
*** 1378,1387 **** --- 1467,1477 ---- COMMIT4res *resp = &resop->nfs_resop4_u.opcommit; int error; vnode_t *vp = cs->vp; cred_t *cr = cs->cr; vattr_t va; + nfs4_srv_t *nsrv4; DTRACE_NFSV4_2(op__commit__start, struct compound_state *, cs, COMMIT4args *, args); if (vp == NULL) {
*** 1434,1445 **** if (error) { *cs->statusp = resp->status = puterrno4(error); goto out; } *cs->statusp = resp->status = NFS4_OK; ! resp->writeverf = Write4verf; out: DTRACE_NFSV4_2(op__commit__done, struct compound_state *, cs, COMMIT4res *, resp); } --- 1524,1536 ---- if (error) { *cs->statusp = resp->status = puterrno4(error); goto out; } + nsrv4 = zone_getspecific(rfs4_zone_key, curzone); *cs->statusp = resp->status = NFS4_OK; ! resp->writeverf = nsrv4->write4verf; out: DTRACE_NFSV4_2(op__commit__done, struct compound_state *, cs, COMMIT4res *, resp); }
*** 2643,2653 **** /* * If at the system root, then can * go up no further. */ ! if (VN_CMP(cs->vp, rootdir)) return (puterrno4(ENOENT)); /* * Traverse back to the mounted-on filesystem */ --- 2734,2744 ---- /* * If at the system root, then can * go up no further. */ ! if (VN_CMP(cs->vp, ZONE_ROOTVP())) return (puterrno4(ENOENT)); /* * Traverse back to the mounted-on filesystem */
*** 3407,3416 **** --- 3498,3508 ---- PUTPUBFH4res *resp = &resop->nfs_resop4_u.opputpubfh; int error; vnode_t *vp; struct exportinfo *exi, *sav_exi; nfs_fh4_fmt_t *fh_fmtp; + nfs_export_t *ne = nfs_get_export(); DTRACE_NFSV4_1(op__putpubfh__start, struct compound_state *, cs); if (cs->vp) { VN_RELE(cs->vp);
*** 3420,3442 **** if (cs->cr) crfree(cs->cr); cs->cr = crdup(cs->basecr); ! vp = exi_public->exi_vp; if (vp == NULL) { *cs->statusp = resp->status = NFS4ERR_SERVERFAULT; goto out; } ! error = makefh4(&cs->fh, vp, exi_public); if (error != 0) { *cs->statusp = resp->status = puterrno4(error); goto out; } sav_exi = cs->exi; ! if (exi_public == exi_root) { /* * No filesystem is actually shared public, so we default * to exi_root. In this case, we must check whether root * is exported. */ --- 3512,3534 ---- if (cs->cr) crfree(cs->cr); cs->cr = crdup(cs->basecr); ! vp = ne->exi_public->exi_vp; if (vp == NULL) { *cs->statusp = resp->status = NFS4ERR_SERVERFAULT; goto out; } ! error = makefh4(&cs->fh, vp, ne->exi_public); if (error != 0) { *cs->statusp = resp->status = puterrno4(error); goto out; } sav_exi = cs->exi; ! if (ne->exi_public == ne->exi_root) { /* * No filesystem is actually shared public, so we default * to exi_root. In this case, we must check whether root * is exported. */
*** 3447,3462 **** * should use is what checkexport4 returns, because root_exi is * actually a mostly empty struct. */ exi = checkexport4(&fh_fmtp->fh4_fsid, (fid_t *)&fh_fmtp->fh4_xlen, NULL); ! cs->exi = ((exi != NULL) ? exi : exi_public); } else { /* * it's a properly shared filesystem */ ! cs->exi = exi_public; } if (is_system_labeled()) { bslabel_t *clabel; --- 3539,3554 ---- * should use is what checkexport4 returns, because root_exi is * actually a mostly empty struct. */ exi = checkexport4(&fh_fmtp->fh4_fsid, (fid_t *)&fh_fmtp->fh4_xlen, NULL); ! cs->exi = ((exi != NULL) ? exi : ne->exi_public); } else { /* * it's a properly shared filesystem */ ! cs->exi = ne->exi_public; } if (is_system_labeled()) { bslabel_t *clabel;
*** 3527,3537 **** if (cs->cr) { crfree(cs->cr); cs->cr = NULL; } - if (args->object.nfs_fh4_len < NFS_FH4_LEN) { *cs->statusp = resp->status = NFS4ERR_BADHANDLE; goto out; } --- 3619,3628 ----
*** 3594,3604 **** * Using rootdir, the system root vnode, * get its fid. */ bzero(&fid, sizeof (fid)); fid.fid_len = MAXFIDSZ; ! error = vop_fid_pseudo(rootdir, &fid); if (error != 0) { *cs->statusp = resp->status = puterrno4(error); goto out; } --- 3685,3695 ---- * Using rootdir, the system root vnode, * get its fid. */ bzero(&fid, sizeof (fid)); fid.fid_len = MAXFIDSZ; ! error = vop_fid_pseudo(ZONE_ROOTVP(), &fid); if (error != 0) { *cs->statusp = resp->status = puterrno4(error); goto out; }
*** 3608,3618 **** * If the server root isn't exported directly, then * it should at least be a pseudo export based on * one or more exports further down in the server's * file tree. */ ! exi = checkexport4(&rootdir->v_vfsp->vfs_fsid, &fid, NULL); if (exi == NULL || exi->exi_export.ex_flags & EX_PUBLIC) { NFS4_DEBUG(rfs4_debug, (CE_WARN, "rfs4_op_putrootfh: export check failure")); *cs->statusp = resp->status = NFS4ERR_SERVERFAULT; goto out; --- 3699,3709 ---- * If the server root isn't exported directly, then * it should at least be a pseudo export based on * one or more exports further down in the server's * file tree. */ ! exi = checkexport4(&ZONE_ROOTVP()->v_vfsp->vfs_fsid, &fid, NULL); if (exi == NULL || exi->exi_export.ex_flags & EX_PUBLIC) { NFS4_DEBUG(rfs4_debug, (CE_WARN, "rfs4_op_putrootfh: export check failure")); *cs->statusp = resp->status = NFS4ERR_SERVERFAULT; goto out;
*** 3620,3643 **** /* * Now make a filehandle based on the root * export and root vnode. */ ! error = makefh4(&cs->fh, rootdir, exi); if (error != 0) { *cs->statusp = resp->status = puterrno4(error); goto out; } sav_exi = cs->exi; cs->exi = exi; ! VN_HOLD(rootdir); ! cs->vp = rootdir; if ((resp->status = call_checkauth4(cs, req)) != NFS4_OK) { ! VN_RELE(rootdir); cs->vp = NULL; cs->exi = sav_exi; goto out; } --- 3711,3734 ---- /* * Now make a filehandle based on the root * export and root vnode. */ ! error = makefh4(&cs->fh, ZONE_ROOTVP(), exi); if (error != 0) { *cs->statusp = resp->status = puterrno4(error); goto out; } sav_exi = cs->exi; cs->exi = exi; ! VN_HOLD(ZONE_ROOTVP()); ! cs->vp = ZONE_ROOTVP(); if ((resp->status = call_checkauth4(cs, req)) != NFS4_OK) { ! VN_RELE(cs->vp); cs->vp = NULL; cs->exi = sav_exi; goto out; }
*** 4244,4254 **** * not ENOTEMPTY, if the directory is not * empty. A System V NFS server needs to map * NFS4ERR_EXIST to NFS4ERR_NOTEMPTY to * transmit over the wire. */ ! if ((error = VOP_RMDIR(dvp, name, rootdir, cs->cr, NULL, 0)) == EEXIST) error = ENOTEMPTY; } } else { if ((error = VOP_REMOVE(dvp, name, cs->cr, NULL, 0)) == 0 && --- 4335,4345 ---- * not ENOTEMPTY, if the directory is not * empty. A System V NFS server needs to map * NFS4ERR_EXIST to NFS4ERR_NOTEMPTY to * transmit over the wire. */ ! if ((error = VOP_RMDIR(dvp, name, ZONE_ROOTVP(), cs->cr, NULL, 0)) == EEXIST) error = ENOTEMPTY; } } else { if ((error = VOP_REMOVE(dvp, name, cs->cr, NULL, 0)) == 0 &&
*** 4356,4373 **** RENAME4args *args = &argop->nfs_argop4_u.oprename; RENAME4res *resp = &resop->nfs_resop4_u.oprename; int error; vnode_t *odvp; vnode_t *ndvp; ! vnode_t *srcvp, *targvp; struct vattr obdva, oidva, oadva; struct vattr nbdva, nidva, nadva; char *onm, *nnm; uint_t olen, nlen; rfs4_file_t *fp, *sfp; int in_crit_src, in_crit_targ; int fp_rele_grant_hold, sfp_rele_grant_hold; bslabel_t *clabel; struct sockaddr *ca; char *converted_onm = NULL; char *converted_nnm = NULL; nfsstat4 status; --- 4447,4465 ---- RENAME4args *args = &argop->nfs_argop4_u.oprename; RENAME4res *resp = &resop->nfs_resop4_u.oprename; int error; vnode_t *odvp; vnode_t *ndvp; ! vnode_t *srcvp, *targvp, *tvp; struct vattr obdva, oidva, oadva; struct vattr nbdva, nidva, nadva; char *onm, *nnm; uint_t olen, nlen; rfs4_file_t *fp, *sfp; int in_crit_src, in_crit_targ; int fp_rele_grant_hold, sfp_rele_grant_hold; + int unlinked; bslabel_t *clabel; struct sockaddr *ca; char *converted_onm = NULL; char *converted_nnm = NULL; nfsstat4 status;
*** 4374,4386 **** DTRACE_NFSV4_2(op__rename__start, struct compound_state *, cs, RENAME4args *, args); fp = sfp = NULL; ! srcvp = targvp = NULL; in_crit_src = in_crit_targ = 0; fp_rele_grant_hold = sfp_rele_grant_hold = 0; /* CURRENT_FH: target directory */ ndvp = cs->vp; if (ndvp == NULL) { *cs->statusp = resp->status = NFS4ERR_NOFILEHANDLE; --- 4466,4479 ---- DTRACE_NFSV4_2(op__rename__start, struct compound_state *, cs, RENAME4args *, args); fp = sfp = NULL; ! srcvp = targvp = tvp = NULL; in_crit_src = in_crit_targ = 0; fp_rele_grant_hold = sfp_rele_grant_hold = 0; + unlinked = 0; /* CURRENT_FH: target directory */ ndvp = cs->vp; if (ndvp == NULL) { *cs->statusp = resp->status = NFS4ERR_NOFILEHANDLE;
*** 4549,4559 **** goto err_out; } } fp_rele_grant_hold = 1; - /* Check for NBMAND lock on both source and target */ if (nbl_need_check(srcvp)) { nbl_start_crit(srcvp, RW_READER); in_crit_src = 1; if (nbl_conflict(srcvp, NBL_RENAME, 0, 0, 0, NULL)) { --- 4642,4651 ----
*** 4584,4618 **** } NFS4_SET_FATTR4_CHANGE(resp->source_cinfo.before, obdva.va_ctime) NFS4_SET_FATTR4_CHANGE(resp->target_cinfo.before, nbdva.va_ctime) ! if ((error = VOP_RENAME(odvp, converted_onm, ndvp, converted_nnm, ! cs->cr, NULL, 0)) == 0 && fp != NULL) { ! struct vattr va; ! vnode_t *tvp; rfs4_dbe_lock(fp->rf_dbe); tvp = fp->rf_vp; if (tvp) VN_HOLD(tvp); rfs4_dbe_unlock(fp->rf_dbe); if (tvp) { va.va_mask = AT_NLINK; if (!VOP_GETATTR(tvp, &va, 0, cs->cr, NULL) && va.va_nlink == 0) { ! /* The file is gone and so should the state */ ! if (in_crit_targ) { ! nbl_end_crit(targvp); ! in_crit_targ = 0; } ! rfs4_close_all_state(fp); ! } VN_RELE(tvp); } } if (error == 0) vn_renamepath(ndvp, srcvp, nnm, nlen - 1); if (in_crit_src) nbl_end_crit(srcvp); --- 4676,4720 ---- } NFS4_SET_FATTR4_CHANGE(resp->source_cinfo.before, obdva.va_ctime) NFS4_SET_FATTR4_CHANGE(resp->target_cinfo.before, nbdva.va_ctime) ! error = VOP_RENAME(odvp, converted_onm, ndvp, converted_nnm, cs->cr, ! NULL, 0); + /* + * If target existed and was unlinked by VOP_RENAME, state will need + * closed. To avoid deadlock, rfs4_close_all_state will be done after + * any necessary nbl_end_crit on srcvp and tgtvp. + */ + if (error == 0 && fp != NULL) { rfs4_dbe_lock(fp->rf_dbe); tvp = fp->rf_vp; if (tvp) VN_HOLD(tvp); rfs4_dbe_unlock(fp->rf_dbe); if (tvp) { + struct vattr va; va.va_mask = AT_NLINK; + if (!VOP_GETATTR(tvp, &va, 0, cs->cr, NULL) && va.va_nlink == 0) { ! unlinked = 1; ! ! /* DEBUG data */ ! if ((srcvp == targvp) || (tvp != targvp)) { ! cmn_err(CE_WARN, "rfs4_op_rename: " ! "srcvp %p, targvp: %p, tvp: %p", ! (void *)srcvp, (void *)targvp, ! (void *)tvp); } ! } else { VN_RELE(tvp); } } + } if (error == 0) vn_renamepath(ndvp, srcvp, nnm, nlen - 1); if (in_crit_src) nbl_end_crit(srcvp);
*** 4621,4630 **** --- 4723,4747 ---- if (in_crit_targ) nbl_end_crit(targvp); if (targvp) VN_RELE(targvp); + if (unlinked) { + ASSERT(fp != NULL); + ASSERT(tvp != NULL); + + /* DEBUG data */ + if (RW_READ_HELD(&tvp->v_nbllock)) { + cmn_err(CE_WARN, "rfs4_op_rename: " + "RW_READ_HELD(%p)", (void *)tvp); + } + + /* The file is gone and so should the state */ + rfs4_close_all_state(fp); + VN_RELE(tvp); + } + if (sfp) { rfs4_clear_dont_grant(sfp); rfs4_file_rele(sfp); } if (fp) {
*** 5557,5566 **** --- 5674,5684 ---- cred_t *savecred, *cr; bool_t *deleg = &cs->deleg; nfsstat4 stat; int in_crit = 0; caller_context_t ct; + nfs4_srv_t *nsrv4; DTRACE_NFSV4_2(op__write__start, struct compound_state *, cs, WRITE4args *, args); vp = cs->vp;
*** 5627,5641 **** if (MANDLOCK(vp, bva.va_mode)) { *cs->statusp = resp->status = NFS4ERR_ACCESS; goto out; } if (args->data_len == 0) { *cs->statusp = resp->status = NFS4_OK; resp->count = 0; resp->committed = args->stable; ! resp->writeverf = Write4verf; goto out; } if (args->mblk != NULL) { mblk_t *m; --- 5745,5760 ---- if (MANDLOCK(vp, bva.va_mode)) { *cs->statusp = resp->status = NFS4ERR_ACCESS; goto out; } + nsrv4 = zone_getspecific(rfs4_zone_key, curzone); if (args->data_len == 0) { *cs->statusp = resp->status = NFS4_OK; resp->count = 0; resp->committed = args->stable; ! resp->writeverf = nsrv4->write4verf; goto out; } if (args->mblk != NULL) { mblk_t *m;
*** 5727,5737 **** if (ioflag == 0) resp->committed = UNSTABLE4; else resp->committed = FILE_SYNC4; ! resp->writeverf = Write4verf; out: if (in_crit) nbl_end_crit(vp); --- 5846,5856 ---- if (ioflag == 0) resp->committed = UNSTABLE4; else resp->committed = FILE_SYNC4; ! resp->writeverf = nsrv4->write4verf; out: if (in_crit) nbl_end_crit(vp);
*** 5747,5756 **** --- 5866,5877 ---- rfs4_compound(COMPOUND4args *args, COMPOUND4res *resp, struct exportinfo *exi, struct svc_req *req, cred_t *cr, int *rv) { uint_t i; struct compound_state cs; + nfs4_srv_t *nsrv4; + nfs_export_t *ne = nfs_get_export(); if (rv != NULL) *rv = 0; rfs4_init_compound_state(&cs); /*
*** 5804,5813 **** --- 5925,5935 ---- resp->array_len = args->array_len; resp->array = kmem_zalloc(args->array_len * sizeof (nfs_resop4), KM_SLEEP); cs.basecr = cr; + nsrv4 = zone_getspecific(rfs4_zone_key, curzone); DTRACE_NFSV4_2(compound__start, struct compound_state *, &cs, COMPOUND4args *, args); /*
*** 5818,5841 **** * per proc (excluding public exinfo), and exi_count design * is sufficient to protect concurrent execution of NFS2/3 * ops along with unexport. This lock will be removed as * part of the NFSv4 phase 2 namespace redesign work. */ ! rw_enter(&exported_lock, RW_READER); /* * If this is the first compound we've seen, we need to start all * new instances' grace periods. */ ! if (rfs4_seen_first_compound == 0) { ! rfs4_grace_start_new(); /* * This must be set after rfs4_grace_start_new(), otherwise * another thread could proceed past here before the former * is finished. */ ! rfs4_seen_first_compound = 1; } for (i = 0; i < args->array_len && cs.cont; i++) { nfs_argop4 *argop; nfs_resop4 *resop; --- 5940,5963 ---- * per proc (excluding public exinfo), and exi_count design * is sufficient to protect concurrent execution of NFS2/3 * ops along with unexport. This lock will be removed as * part of the NFSv4 phase 2 namespace redesign work. */ ! rw_enter(&ne->exported_lock, RW_READER); /* * If this is the first compound we've seen, we need to start all * new instances' grace periods. */ ! if (nsrv4->seen_first_compound == 0) { ! rfs4_grace_start_new(nsrv4); /* * This must be set after rfs4_grace_start_new(), otherwise * another thread could proceed past here before the former * is finished. */ ! nsrv4->seen_first_compound = 1; } for (i = 0; i < args->array_len && cs.cont; i++) { nfs_argop4 *argop; nfs_resop4 *resop;
*** 5845,5868 **** --- 5967,6052 ---- resop = &resp->array[i]; resop->resop = argop->argop; op = (uint_t)resop->resop; if (op < rfsv4disp_cnt) { + kstat_t *ksp = rfsprocio_v4_ptr[op]; + kstat_t *exi_ksp = NULL; + /* * Count the individual ops here; NULL and COMPOUND * are counted in common_dispatch() */ rfsproccnt_v4_ptr[op].value.ui64++; + if (ksp != NULL) { + mutex_enter(ksp->ks_lock); + kstat_runq_enter(KSTAT_IO_PTR(ksp)); + mutex_exit(ksp->ks_lock); + } + + switch (rfsv4disptab[op].op_type) { + case NFS4_OP_CFH: + resop->exi = cs.exi; + break; + case NFS4_OP_SFH: + resop->exi = cs.saved_exi; + break; + default: + ASSERT(resop->exi == NULL); + break; + } + + if (resop->exi != NULL) { + exi_ksp = NULL; + if (resop->exi->exi_kstats != NULL) { + exi_ksp = exp_kstats_v4( + resop->exi->exi_kstats, op); + } + if (exi_ksp != NULL) { + mutex_enter(exi_ksp->ks_lock); + kstat_runq_enter(KSTAT_IO_PTR(exi_ksp)); + mutex_exit(exi_ksp->ks_lock); + } + } + NFS4_DEBUG(rfs4_debug > 1, (CE_NOTE, "Executing %s", rfs4_op_string[op])); (*rfsv4disptab[op].dis_proc)(argop, resop, req, &cs); NFS4_DEBUG(rfs4_debug > 1, (CE_NOTE, "%s returned %d", rfs4_op_string[op], *cs.statusp)); if (*cs.statusp != NFS4_OK) cs.cont = FALSE; + + if (rfsv4disptab[op].op_type == NFS4_OP_POSTCFH && + *cs.statusp == NFS4_OK && + (resop->exi = cs.exi) != NULL) { + exi_ksp = NULL; + if (resop->exi->exi_kstats != NULL) { + exi_ksp = exp_kstats_v4( + resop->exi->exi_kstats, op); + } + } + + if (exi_ksp != NULL) { + mutex_enter(exi_ksp->ks_lock); + KSTAT_IO_PTR(exi_ksp)->nwritten += + argop->opsize; + KSTAT_IO_PTR(exi_ksp)->writes++; + if (rfsv4disptab[op].op_type != NFS4_OP_POSTCFH) + kstat_runq_exit(KSTAT_IO_PTR(exi_ksp)); + mutex_exit(exi_ksp->ks_lock); } else { + resop->exi = NULL; + } + + if (ksp != NULL) { + mutex_enter(ksp->ks_lock); + kstat_runq_exit(KSTAT_IO_PTR(ksp)); + mutex_exit(ksp->ks_lock); + } + } else { /* * This is effectively dead code since XDR code * will have already returned BADXDR if op doesn't * decode to legal value. This only done for a * day when XDR code doesn't verify v4 opcodes.
*** 5873,5907 **** rfs4_op_illegal(argop, resop, req, &cs); cs.cont = FALSE; } /* * If not at last op, and if we are to stop, then * compact the results array. */ if ((i + 1) < args->array_len && !cs.cont) { nfs_resop4 *new_res = kmem_alloc( ! (i+1) * sizeof (nfs_resop4), KM_SLEEP); bcopy(resp->array, ! new_res, (i+1) * sizeof (nfs_resop4)); kmem_free(resp->array, args->array_len * sizeof (nfs_resop4)); resp->array_len = i + 1; resp->array = new_res; } } ! rw_exit(&exported_lock); ! DTRACE_NFSV4_2(compound__done, struct compound_state *, &cs, ! COMPOUND4res *, resp); ! if (cs.vp) VN_RELE(cs.vp); if (cs.saved_vp) VN_RELE(cs.saved_vp); if (cs.saved_fh.nfs_fh4_val) kmem_free(cs.saved_fh.nfs_fh4_val, NFS4_FHSIZE); if (cs.basecr) crfree(cs.basecr); --- 6057,6106 ---- rfs4_op_illegal(argop, resop, req, &cs); cs.cont = FALSE; } /* + * The exi saved in the resop to be used for kstats update + * once the opsize is calculated during XDR response encoding. + * Put a hold on resop->exi so that it can't be destroyed. + */ + if (resop->exi != NULL) + exi_hold(resop->exi); + + /* * If not at last op, and if we are to stop, then * compact the results array. */ if ((i + 1) < args->array_len && !cs.cont) { nfs_resop4 *new_res = kmem_alloc( ! (i + 1) * sizeof (nfs_resop4), KM_SLEEP); bcopy(resp->array, ! new_res, (i + 1) * sizeof (nfs_resop4)); kmem_free(resp->array, args->array_len * sizeof (nfs_resop4)); resp->array_len = i + 1; resp->array = new_res; } } ! rw_exit(&ne->exported_lock); ! /* ! * clear exportinfo and vnode fields from compound_state before dtrace ! * probe, to avoid tracing residual values for path and share path. ! */ if (cs.vp) VN_RELE(cs.vp); if (cs.saved_vp) VN_RELE(cs.saved_vp); + cs.exi = cs.saved_exi = NULL; + cs.vp = cs.saved_vp = NULL; + + DTRACE_NFSV4_2(compound__done, struct compound_state *, &cs, + COMPOUND4res *, resp); + if (cs.saved_fh.nfs_fh4_val) kmem_free(cs.saved_fh.nfs_fh4_val, NFS4_FHSIZE); if (cs.basecr) crfree(cs.basecr);
*** 5967,5976 **** --- 6166,6262 ---- flag = 0; } *flagp = flag; } + /* + * Update the kstats for the received requests. + * Note: writes/nwritten are used to hold count and nbytes of requests received. + * + * Per export request statistics need to be updated during the compound request + * processing (rfs4_compound()) as that is where it is known which exportinfo to + * associate the kstats with. + */ + void + rfs4_compound_kstat_args(COMPOUND4args *args) + { + int i; + + for (i = 0; i < args->array_len; i++) { + uint_t op = (uint_t)args->array[i].argop; + + if (op < rfsv4disp_cnt) { + kstat_t *ksp = rfsprocio_v4_ptr[op]; + + if (ksp != NULL) { + mutex_enter(ksp->ks_lock); + KSTAT_IO_PTR(ksp)->nwritten += + args->array[i].opsize; + KSTAT_IO_PTR(ksp)->writes++; + mutex_exit(ksp->ks_lock); + } + } + } + } + + /* + * Update the kstats for the sent responses. + * Note: reads/nread are used to hold count and nbytes of responses sent. + * + * Per export response statistics cannot be updated until here, after the + * response send has generated the opsize (bytes sent) in the XDR encoding. + * The exportinfo with which the kstats should be associated is thus saved + * in the response structure (by rfs4_compound()) for use here. A hold is + * placed on the exi to ensure it cannot be deleted before use. This hold + * is released, and the exi set to NULL, here. + */ + void + rfs4_compound_kstat_res(COMPOUND4res *res) + { + int i; + nfs_export_t *ne = nfs_get_export(); + + for (i = 0; i < res->array_len; i++) { + uint_t op = (uint_t)res->array[i].resop; + + if (op < rfsv4disp_cnt) { + kstat_t *ksp = rfsprocio_v4_ptr[op]; + struct exportinfo *exi = res->array[i].exi; + + if (ksp != NULL) { + mutex_enter(ksp->ks_lock); + KSTAT_IO_PTR(ksp)->nread += + res->array[i].opsize; + KSTAT_IO_PTR(ksp)->reads++; + mutex_exit(ksp->ks_lock); + } + + if (exi != NULL) { + kstat_t *exi_ksp = NULL; + + rw_enter(&ne->exported_lock, RW_READER); + + if (exi->exi_kstats != NULL) { + /*CSTYLED*/ + exi_ksp = exp_kstats_v4(exi->exi_kstats, op); + } + if (exi_ksp != NULL) { + mutex_enter(exi_ksp->ks_lock); + KSTAT_IO_PTR(exi_ksp)->nread += + res->array[i].opsize; + KSTAT_IO_PTR(exi_ksp)->reads++; + mutex_exit(exi_ksp->ks_lock); + } + + exi_rele(&exi); + res->array[i].exi = NULL; + rw_exit(&ne->exported_lock); + } + } + } + } + nfsstat4 rfs4_client_sysid(rfs4_client_t *cp, sysid_t *sp) { nfsstat4 e;
*** 6601,6629 **** */ if (trunc) { int in_crit = 0; rfs4_file_t *fp; bool_t create = FALSE; /* * We are writing over an existing file. * Check to see if we need to recall a delegation. */ ! rfs4_hold_deleg_policy(); if ((fp = rfs4_findfile(vp, NULL, &create)) != NULL) { if (rfs4_check_delegated_byfp(FWRITE, fp, (reqsize == 0), FALSE, FALSE, &clientid)) { rfs4_file_rele(fp); ! rfs4_rele_deleg_policy(); VN_RELE(vp); *attrset = 0; return (NFS4ERR_DELAY); } rfs4_file_rele(fp); } ! rfs4_rele_deleg_policy(); if (nbl_need_check(vp)) { in_crit = 1; ASSERT(reqsize == 0); --- 6887,6917 ---- */ if (trunc) { int in_crit = 0; rfs4_file_t *fp; + nfs4_srv_t *nsrv4; bool_t create = FALSE; /* * We are writing over an existing file. * Check to see if we need to recall a delegation. */ ! nsrv4 = zone_getspecific(rfs4_zone_key, curzone); ! rfs4_hold_deleg_policy(nsrv4); if ((fp = rfs4_findfile(vp, NULL, &create)) != NULL) { if (rfs4_check_delegated_byfp(FWRITE, fp, (reqsize == 0), FALSE, FALSE, &clientid)) { rfs4_file_rele(fp); ! rfs4_rele_deleg_policy(nsrv4); VN_RELE(vp); *attrset = 0; return (NFS4ERR_DELAY); } rfs4_file_rele(fp); } ! rfs4_rele_deleg_policy(nsrv4); if (nbl_need_check(vp)) { in_crit = 1; ASSERT(reqsize == 0);
*** 8177,8191 **** --- 8465,8481 ---- SETCLIENTID_CONFIRM4args *args = &argop->nfs_argop4_u.opsetclientid_confirm; SETCLIENTID_CONFIRM4res *res = &resop->nfs_resop4_u.opsetclientid_confirm; rfs4_client_t *cp, *cptoclose = NULL; + nfs4_srv_t *nsrv4; DTRACE_NFSV4_2(op__setclientid__confirm__start, struct compound_state *, cs, SETCLIENTID_CONFIRM4args *, args); + nsrv4 = zone_getspecific(rfs4_zone_key, curzone); *cs->statusp = res->status = NFS4_OK; cp = rfs4_findclient_by_id(args->clientid, TRUE); if (cp == NULL) {
*** 8217,8234 **** /* * Update the client's associated server instance, if it's changed * since the client was created. */ ! if (rfs4_servinst(cp) != rfs4_cur_servinst) ! rfs4_servinst_assign(cp, rfs4_cur_servinst); /* * Record clientid in stable storage. * Must be done after server instance has been assigned. */ ! rfs4_ss_clid(cp); rfs4_dbe_unlock(cp->rc_dbe); if (cptoclose) /* don't need to rele, client_close does it */ --- 8507,8524 ---- /* * Update the client's associated server instance, if it's changed * since the client was created. */ ! if (rfs4_servinst(cp) != nsrv4->nfs4_cur_servinst) ! rfs4_servinst_assign(nsrv4, cp, nsrv4->nfs4_cur_servinst); /* * Record clientid in stable storage. * Must be done after server instance has been assigned. */ ! rfs4_ss_clid(nsrv4, cp); rfs4_dbe_unlock(cp->rc_dbe); if (cptoclose) /* don't need to rele, client_close does it */
*** 8239,8249 **** rfs4_update_lease(cp); /* * Check to see if client can perform reclaims */ ! rfs4_ss_chkclid(cp); rfs4_client_rele(cp); out: DTRACE_NFSV4_2(op__setclientid__confirm__done, --- 8529,8539 ---- rfs4_update_lease(cp); /* * Check to see if client can perform reclaims */ ! rfs4_ss_chkclid(nsrv4, cp); rfs4_client_rele(cp); out: DTRACE_NFSV4_2(op__setclientid__confirm__done,
*** 9883,9888 **** --- 10173,10342 ---- if (ci == NULL) return (0); is_downrev = ci->ri_no_referrals; rfs4_dbe_rele(ci->ri_dbe); return (is_downrev); + } + + /* + * Do the main work of handling HA-NFSv4 Resource Group failover on + * Sun Cluster. + * We need to detect whether any RG admin paths have been added or removed, + * and adjust resources accordingly. + * Currently we're using a very inefficient algorithm, ~ 2 * O(n**2). In + * order to scale, the list and array of paths need to be held in more + * suitable data structures. + */ + static void + hanfsv4_failover(nfs4_srv_t *nsrv4) + { + int i, start_grace, numadded_paths = 0; + char **added_paths = NULL; + rfs4_dss_path_t *dss_path; + + /* + * Note: currently, dss_pathlist cannot be NULL, since + * it will always include an entry for NFS4_DSS_VAR_DIR. If we + * make the latter dynamically specified too, the following will + * need to be adjusted. + */ + + /* + * First, look for removed paths: RGs that have been failed-over + * away from this node. + * Walk the "currently-serving" dss_pathlist and, for each + * path, check if it is on the "passed-in" rfs4_dss_newpaths array + * from nfsd. If not, that RG path has been removed. + * + * Note that nfsd has sorted rfs4_dss_newpaths for us, and removed + * any duplicates. + */ + dss_path = nsrv4->dss_pathlist; + do { + int found = 0; + char *path = dss_path->path; + + /* used only for non-HA so may not be removed */ + if (strcmp(path, NFS4_DSS_VAR_DIR) == 0) { + dss_path = dss_path->next; + continue; + } + + for (i = 0; i < rfs4_dss_numnewpaths; i++) { + int cmpret; + char *newpath = rfs4_dss_newpaths[i]; + + /* + * Since nfsd has sorted rfs4_dss_newpaths for us, + * once the return from strcmp is negative we know + * we've passed the point where "path" should be, + * and can stop searching: "path" has been removed. + */ + cmpret = strcmp(path, newpath); + if (cmpret < 0) + break; + if (cmpret == 0) { + found = 1; + break; + } + } + + if (found == 0) { + unsigned index = dss_path->index; + rfs4_servinst_t *sip = dss_path->sip; + rfs4_dss_path_t *path_next = dss_path->next; + + /* + * This path has been removed. + * We must clear out the servinst reference to + * it, since it's now owned by another + * node: we should not attempt to touch it. + */ + ASSERT(dss_path == sip->dss_paths[index]); + sip->dss_paths[index] = NULL; + + /* remove from "currently-serving" list, and destroy */ + remque(dss_path); + /* allow for NUL */ + kmem_free(dss_path->path, strlen(dss_path->path) + 1); + kmem_free(dss_path, sizeof (rfs4_dss_path_t)); + + dss_path = path_next; + } else { + /* path was found; not removed */ + dss_path = dss_path->next; + } + } while (dss_path != nsrv4->dss_pathlist); + + /* + * Now, look for added paths: RGs that have been failed-over + * to this node. + * Walk the "passed-in" rfs4_dss_newpaths array from nfsd and, + * for each path, check if it is on the "currently-serving" + * dss_pathlist. If not, that RG path has been added. + * + * Note: we don't do duplicate detection here; nfsd does that for us. + * + * Note: numadded_paths <= rfs4_dss_numnewpaths, which gives us + * an upper bound for the size needed for added_paths[numadded_paths]. + */ + + /* probably more space than we need, but guaranteed to be enough */ + if (rfs4_dss_numnewpaths > 0) { + size_t sz = rfs4_dss_numnewpaths * sizeof (char *); + added_paths = kmem_zalloc(sz, KM_SLEEP); + } + + /* walk the "passed-in" rfs4_dss_newpaths array from nfsd */ + for (i = 0; i < rfs4_dss_numnewpaths; i++) { + int found = 0; + char *newpath = rfs4_dss_newpaths[i]; + + dss_path = nsrv4->dss_pathlist; + do { + char *path = dss_path->path; + + /* used only for non-HA */ + if (strcmp(path, NFS4_DSS_VAR_DIR) == 0) { + dss_path = dss_path->next; + continue; + } + + if (strncmp(path, newpath, strlen(path)) == 0) { + found = 1; + break; + } + + dss_path = dss_path->next; + } while (dss_path != nsrv4->dss_pathlist); + + if (found == 0) { + added_paths[numadded_paths] = newpath; + numadded_paths++; + } + } + + /* did we find any added paths? */ + if (numadded_paths > 0) { + + /* create a new server instance, and start its grace period */ + start_grace = 1; + /* CSTYLED */ + rfs4_servinst_create(nsrv4, start_grace, numadded_paths, added_paths); + + /* read in the stable storage state from these paths */ + rfs4_dss_readstate(nsrv4, numadded_paths, added_paths); + + /* + * Multiple failovers during a grace period will cause + * clients of the same resource group to be partitioned + * into different server instances, with different + * grace periods. Since clients of the same resource + * group must be subject to the same grace period, + * we need to reset all currently active grace periods. + */ + rfs4_grace_reset_all(nsrv4); + } + + if (rfs4_dss_numnewpaths > 0) + kmem_free(added_paths, rfs4_dss_numnewpaths * sizeof (char *)); }