Print this page
NEX-16917 Need to reduce the impact of NFS per-share kstats on failover
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-16712 NFS dtrace providers do not support per-share filtering
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Yuri Pankon <yuri.pankov@nexenta.com>
NEX-15279 support NFS server in zone
NEX-15520 online NFS shares cause zoneadm halt to hang in nfs_export_zone_fini
Portions contributed by: Dan Kruchinin dan.kruchinin@nexenta.com
Portions contributed by: Stepan Zastupov stepan.zastupov@gmail.com
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
NEX-9275 Got "bad mutex" panic when run IO to nfs share from clients
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
NEX-14051 Be careful with RPC groups
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
8085 Handle RPC groups better
Reviewed by: "Joshua M. Clulow" <josh@sysmgr.org>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Dan McDonald <danmcd@omniti.com>
NEX-7366 Getting panic in "module "nfssrv" due to a NULL pointer dereference" when updating NFS shares on a pool
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Steve Peng <steve.peng@nexenta.com>
NEX-6778 NFS kstats leak and cause system to hang
Revert "NEX-4261 Per-client NFS server IOPS, bandwidth, and latency kstats"
This reverts commit 586c3ab1927647487f01c337ddc011c642575a52.
Revert "NEX-5354 Aggregated IOPS, bandwidth, and latency kstats for NFS server"
This reverts commit c91d7614da8618ef48018102b077f60ecbbac8c2.
Revert "NEX-5667 nfssrv_stats_flags does not work for aggregated kstats"
This reverts commit 3dcf42618be7dd5f408c327f429c81e07ca08e74.
Revert "NEX-5750 Time values for aggregated NFS server kstats should be normalized"
This reverts commit 1f4d4f901153b0191027969fa4a8064f9d3b9ee1.
Revert "NEX-5942 Panic in rfs4_minorvers_mismatch() with NFSv4.1 client"
This reverts commit 40766417094a162f5e4cc8786c0fa0a7e5871cd9.
Revert "NEX-5752 NFS server: namespace collision in kstats"
This reverts commit ae81e668db86050da8e483264acb0cce0444a132.
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-4261 Per-client NFS server IOPS, bandwidth, and latency kstats
Reviewed by: Kevin Crowe <kevin.crowe@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-3097 IOPS, bandwidth, and latency kstats for NFS server
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-1974 Support for more than 16 groups with AUTH_SYS
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-1128 NFS server: Generic uid and gid remapping for AUTH_SYS
Reviewed by: Jan Kryl <jan.kryl@nexenta.com>
re #13613 rb4516 Tunables needs volatile keyword

Split Close
Expand all
Collapse all
          --- old/usr/src/uts/common/fs/nfs/nfs_server.c
          +++ new/usr/src/uts/common/fs/nfs/nfs_server.c
↓ open down ↓ 10 lines elided ↑ open up ↑
  11   11   * and limitations under the License.
  12   12   *
  13   13   * When distributing Covered Code, include this CDDL HEADER in each
  14   14   * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  15   15   * If applicable, add the following below this CDDL HEADER, with the
  16   16   * fields enclosed by brackets "[]" replaced with your own identifying
  17   17   * information: Portions Copyright [yyyy] [name of copyright owner]
  18   18   *
  19   19   * CDDL HEADER END
  20   20   */
       21 +
  21   22  /*
  22   23   * Copyright (c) 1990, 2010, Oracle and/or its affiliates. All rights reserved.
  23      - * Copyright (c) 2011 Bayard G. Bell. All rights reserved.
  24      - * Copyright (c) 2013 by Delphix. All rights reserved.
  25      - * Copyright 2014 Nexenta Systems, Inc.  All rights reserved.
  26      - * Copyright (c) 2017 Joyent Inc
  27   24   */
  28   25  
  29   26  /*
  30   27   *      Copyright (c) 1983,1984,1985,1986,1987,1988,1989  AT&T.
  31   28   *      All rights reserved.
  32   29   *      Use is subject to license terms.
  33   30   */
  34   31  
       32 +/*
       33 + * Copyright (c) 2011 Bayard G. Bell. All rights reserved.
       34 + * Copyright (c) 2013 by Delphix. All rights reserved.
       35 + * Copyright 2018 Nexenta Systems, Inc.
       36 + * Copyright (c) 2017 Joyent Inc
       37 + */
       38 +
  35   39  #include <sys/param.h>
  36   40  #include <sys/types.h>
  37   41  #include <sys/systm.h>
  38   42  #include <sys/cred.h>
  39   43  #include <sys/proc.h>
  40   44  #include <sys/user.h>
  41   45  #include <sys/buf.h>
  42   46  #include <sys/vfs.h>
  43   47  #include <sys/vnode.h>
  44   48  #include <sys/pathname.h>
↓ open down ↓ 31 lines elided ↑ open up ↑
  76   80  #include <rpc/svc.h>
  77   81  #include <rpc/xdr.h>
  78   82  #include <rpc/rpc_rdma.h>
  79   83  
  80   84  #include <nfs/nfs.h>
  81   85  #include <nfs/export.h>
  82   86  #include <nfs/nfssys.h>
  83   87  #include <nfs/nfs_clnt.h>
  84   88  #include <nfs/nfs_acl.h>
  85   89  #include <nfs/nfs_log.h>
  86      -#include <nfs/nfs_cmd.h>
  87   90  #include <nfs/lm.h>
  88   91  #include <nfs/nfs_dispatch.h>
  89   92  #include <nfs/nfs4_drc.h>
  90   93  
  91   94  #include <sys/modctl.h>
  92   95  #include <sys/cladm.h>
  93   96  #include <sys/clconf.h>
  94   97  
  95   98  #include <sys/tsol/label.h>
  96   99  
↓ open down ↓ 5 lines elided ↑ open up ↑
 102  105   */
 103  106  
 104  107  static struct modlmisc modlmisc = {
 105  108          &mod_miscops, "NFS server module"
 106  109  };
 107  110  
 108  111  static struct modlinkage modlinkage = {
 109  112          MODREV_1, (void *)&modlmisc, NULL
 110  113  };
 111  114  
      115 +zone_key_t nfssrv_zone_key;
 112  116  kmem_cache_t *nfs_xuio_cache;
 113  117  int nfs_loaned_buffers = 0;
 114  118  
 115  119  int
 116  120  _init(void)
 117  121  {
 118  122          int status;
 119  123  
 120      -        if ((status = nfs_srvinit()) != 0) {
 121      -                cmn_err(CE_WARN, "_init: nfs_srvinit failed");
 122      -                return (status);
 123      -        }
      124 +        nfs_srvinit();
 124  125  
 125  126          status = mod_install((struct modlinkage *)&modlinkage);
 126  127          if (status != 0) {
 127  128                  /*
 128  129                   * Could not load module, cleanup previous
 129  130                   * initialization work.
 130  131                   */
 131  132                  nfs_srvfini();
 132  133  
 133  134                  return (status);
↓ open down ↓ 36 lines elided ↑ open up ↑
 170  171  /*
 171  172   * PUBLICFH_CHECK() checks if the dispatch routine supports
 172  173   * RPC_PUBLICFH_OK, if the filesystem is exported public, and if the
 173  174   * incoming request is using the public filehandle. The check duplicates
 174  175   * the exportmatch() call done in checkexport(), and we should consider
 175  176   * modifying those routines to avoid the duplication. For now, we optimize
 176  177   * by calling exportmatch() only after checking that the dispatch routine
 177  178   * supports RPC_PUBLICFH_OK, and if the filesystem is explicitly exported
 178  179   * public (i.e., not the placeholder).
 179  180   */
 180      -#define PUBLICFH_CHECK(disp, exi, fsid, xfid) \
      181 +#define PUBLICFH_CHECK(ne, disp, exi, fsid, xfid) \
 181  182                  ((disp->dis_flags & RPC_PUBLICFH_OK) && \
 182  183                  ((exi->exi_export.ex_flags & EX_PUBLIC) || \
 183      -                (exi == exi_public && exportmatch(exi_root, \
      184 +                (exi == ne->exi_public && exportmatch(ne->exi_root, \
 184  185                  fsid, xfid))))
 185  186  
 186  187  static void     nfs_srv_shutdown_all(int);
 187      -static void     rfs4_server_start(int);
      188 +static void     rfs4_server_start(nfs_globals_t *, int);
 188  189  static void     nullfree(void);
 189  190  static void     rfs_dispatch(struct svc_req *, SVCXPRT *);
 190  191  static void     acl_dispatch(struct svc_req *, SVCXPRT *);
 191  192  static void     common_dispatch(struct svc_req *, SVCXPRT *,
 192  193                  rpcvers_t, rpcvers_t, char *,
 193  194                  struct rpc_disptable *);
 194      -static void     hanfsv4_failover(void);
 195  195  static  int     checkauth(struct exportinfo *, struct svc_req *, cred_t *, int,
 196  196                  bool_t, bool_t *);
 197  197  static char     *client_name(struct svc_req *req);
 198  198  static char     *client_addr(struct svc_req *req, char *buf);
 199  199  extern  int     sec_svc_getcred(struct svc_req *, cred_t *cr, char **, int *);
 200  200  extern  bool_t  sec_svc_inrootlist(int, caddr_t, int, caddr_t *);
      201 +static void     *nfs_srv_zone_init(zoneid_t);
      202 +static void     nfs_srv_zone_fini(zoneid_t, void *);
 201  203  
 202  204  #define NFSLOG_COPY_NETBUF(exi, xprt, nb)       {               \
 203  205          (nb)->maxlen = (xprt)->xp_rtaddr.maxlen;                \
 204  206          (nb)->len = (xprt)->xp_rtaddr.len;                      \
 205  207          (nb)->buf = kmem_alloc((nb)->len, KM_SLEEP);            \
 206  208          bcopy((xprt)->xp_rtaddr.buf, (nb)->buf, (nb)->len);     \
 207  209          }
 208  210  
 209  211  /*
 210  212   * Public Filehandle common nfs routines
↓ open down ↓ 30 lines elided ↑ open up ↑
 241  243  };
 242  244  
 243  245  static SVC_CALLOUT __nfs_sc_rdma[] = {
 244  246          { NFS_PROGRAM,     NFS_VERSMIN,     NFS_VERSMAX,        rfs_dispatch },
 245  247          { NFS_ACL_PROGRAM, NFS_ACL_VERSMIN, NFS_ACL_VERSMAX,    acl_dispatch }
 246  248  };
 247  249  
 248  250  static SVC_CALLOUT_TABLE nfs_sct_rdma = {
 249  251          sizeof (__nfs_sc_rdma) / sizeof (__nfs_sc_rdma[0]), FALSE, __nfs_sc_rdma
 250  252  };
 251      -rpcvers_t nfs_versmin = NFS_VERSMIN_DEFAULT;
 252      -rpcvers_t nfs_versmax = NFS_VERSMAX_DEFAULT;
 253  253  
 254  254  /*
 255      - * Used to track the state of the server so that initialization
 256      - * can be done properly.
 257      - */
 258      -typedef enum {
 259      -        NFS_SERVER_STOPPED,     /* server state destroyed */
 260      -        NFS_SERVER_STOPPING,    /* server state being destroyed */
 261      -        NFS_SERVER_RUNNING,
 262      -        NFS_SERVER_QUIESCED,    /* server state preserved */
 263      -        NFS_SERVER_OFFLINE      /* server pool offline */
 264      -} nfs_server_running_t;
 265      -
 266      -static nfs_server_running_t nfs_server_upordown;
 267      -static kmutex_t nfs_server_upordown_lock;
 268      -static  kcondvar_t nfs_server_upordown_cv;
 269      -
 270      -/*
 271  255   * DSS: distributed stable storage
 272  256   * lists of all DSS paths: current, and before last warmstart
 273  257   */
 274  258  nvlist_t *rfs4_dss_paths, *rfs4_dss_oldpaths;
 275  259  
 276      -int rfs4_dispatch(struct rpcdisp *, struct svc_req *, SVCXPRT *, char *);
      260 +int rfs4_dispatch(struct rpcdisp *, struct svc_req *, SVCXPRT *, char *,
      261 +    size_t *);
 277  262  bool_t rfs4_minorvers_mismatch(struct svc_req *, SVCXPRT *, void *);
 278  263  
 279  264  /*
 280      - * RDMA wait variables.
 281      - */
 282      -static kcondvar_t rdma_wait_cv;
 283      -static kmutex_t rdma_wait_mutex;
 284      -
 285      -/*
 286  265   * Will be called at the point the server pool is being unregistered
 287  266   * from the pool list. From that point onwards, the pool is waiting
 288  267   * to be drained and as such the server state is stale and pertains
 289  268   * to the old instantiation of the NFS server pool.
 290  269   */
 291  270  void
 292  271  nfs_srv_offline(void)
 293  272  {
 294      -        mutex_enter(&nfs_server_upordown_lock);
 295      -        if (nfs_server_upordown == NFS_SERVER_RUNNING) {
 296      -                nfs_server_upordown = NFS_SERVER_OFFLINE;
      273 +        nfs_globals_t *ng;
      274 +
      275 +        ng = zone_getspecific(nfssrv_zone_key, curzone);
      276 +
      277 +        mutex_enter(&ng->nfs_server_upordown_lock);
      278 +        if (ng->nfs_server_upordown == NFS_SERVER_RUNNING) {
      279 +                ng->nfs_server_upordown = NFS_SERVER_OFFLINE;
 297  280          }
 298      -        mutex_exit(&nfs_server_upordown_lock);
      281 +        mutex_exit(&ng->nfs_server_upordown_lock);
 299  282  }
 300  283  
 301  284  /*
 302  285   * Will be called at the point the server pool is being destroyed so
 303  286   * all transports have been closed and no service threads are in
 304  287   * existence.
 305  288   *
 306  289   * If we quiesce the server, we're shutting it down without destroying the
 307  290   * server state. This allows it to warm start subsequently.
 308  291   */
↓ open down ↓ 8 lines elided ↑ open up ↑
 317  300   * This alternative shutdown routine can be requested via nfssys()
 318  301   */
 319  302  void
 320  303  nfs_srv_quiesce_all(void)
 321  304  {
 322  305          int quiesce = 1;
 323  306          nfs_srv_shutdown_all(quiesce);
 324  307  }
 325  308  
 326  309  static void
 327      -nfs_srv_shutdown_all(int quiesce) {
 328      -        mutex_enter(&nfs_server_upordown_lock);
      310 +nfs_srv_shutdown_all(int quiesce)
      311 +{
      312 +        nfs_globals_t *ng = zone_getspecific(nfssrv_zone_key, curzone);
      313 +
      314 +        mutex_enter(&ng->nfs_server_upordown_lock);
 329  315          if (quiesce) {
 330      -                if (nfs_server_upordown == NFS_SERVER_RUNNING ||
 331      -                        nfs_server_upordown == NFS_SERVER_OFFLINE) {
 332      -                        nfs_server_upordown = NFS_SERVER_QUIESCED;
 333      -                        cv_signal(&nfs_server_upordown_cv);
      316 +                if (ng->nfs_server_upordown == NFS_SERVER_RUNNING ||
      317 +                    ng->nfs_server_upordown == NFS_SERVER_OFFLINE) {
      318 +                        ng->nfs_server_upordown = NFS_SERVER_QUIESCED;
      319 +                        cv_signal(&ng->nfs_server_upordown_cv);
 334  320  
 335  321                          /* reset DSS state, for subsequent warm restart */
 336  322                          rfs4_dss_numnewpaths = 0;
 337  323                          rfs4_dss_newpaths = NULL;
 338  324  
 339  325                          cmn_err(CE_NOTE, "nfs_server: server is now quiesced; "
 340  326                              "NFSv4 state has been preserved");
 341  327                  }
 342  328          } else {
 343      -                if (nfs_server_upordown == NFS_SERVER_OFFLINE) {
 344      -                        nfs_server_upordown = NFS_SERVER_STOPPING;
 345      -                        mutex_exit(&nfs_server_upordown_lock);
 346      -                        rfs4_state_fini();
 347      -                        rfs4_fini_drc(nfs4_drc);
 348      -                        mutex_enter(&nfs_server_upordown_lock);
 349      -                        nfs_server_upordown = NFS_SERVER_STOPPED;
 350      -                        cv_signal(&nfs_server_upordown_cv);
      329 +                if (ng->nfs_server_upordown == NFS_SERVER_OFFLINE) {
      330 +                        ng->nfs_server_upordown = NFS_SERVER_STOPPING;
      331 +                        mutex_exit(&ng->nfs_server_upordown_lock);
      332 +                        rfs4_state_zone_fini();
      333 +                        rfs4_fini_drc();
      334 +                        mutex_enter(&ng->nfs_server_upordown_lock);
      335 +                        ng->nfs_server_upordown = NFS_SERVER_STOPPED;
      336 +                        cv_signal(&ng->nfs_server_upordown_cv);
 351  337                  }
 352  338          }
 353      -        mutex_exit(&nfs_server_upordown_lock);
      339 +        mutex_exit(&ng->nfs_server_upordown_lock);
 354  340  }
 355  341  
 356  342  static int
 357  343  nfs_srv_set_sc_versions(struct file *fp, SVC_CALLOUT_TABLE **sctpp,
 358      -                        rpcvers_t versmin, rpcvers_t versmax)
      344 +    rpcvers_t versmin, rpcvers_t versmax)
 359  345  {
 360  346          struct strioctl strioc;
 361  347          struct T_info_ack tinfo;
 362  348          int             error, retval;
 363  349  
 364  350          /*
 365  351           * Find out what type of transport this is.
 366  352           */
 367  353          strioc.ic_cmd = TI_GETINFO;
 368  354          strioc.ic_timout = -1;
↓ open down ↓ 42 lines elided ↑ open up ↑
 411  397  }
 412  398  
 413  399  /*
 414  400   * NFS Server system call.
 415  401   * Does all of the work of running a NFS server.
 416  402   * uap->fd is the fd of an open transport provider
 417  403   */
 418  404  int
 419  405  nfs_svc(struct nfs_svc_args *arg, model_t model)
 420  406  {
      407 +        nfs_globals_t *ng;
 421  408          file_t *fp;
 422  409          SVCMASTERXPRT *xprt;
 423  410          int error;
 424  411          int readsize;
 425  412          char buf[KNC_STRSIZE];
 426  413          size_t len;
 427  414          STRUCT_HANDLE(nfs_svc_args, uap);
 428  415          struct netbuf addrmask;
 429  416          SVC_CALLOUT_TABLE *sctp = NULL;
 430  417  
 431  418  #ifdef lint
 432  419          model = model;          /* STRUCT macros don't always refer to it */
 433  420  #endif
 434  421  
      422 +        ng = zone_getspecific(nfssrv_zone_key, curzone);
 435  423          STRUCT_SET_HANDLE(uap, model, arg);
 436  424  
 437  425          /* Check privileges in nfssys() */
 438  426  
 439  427          if ((fp = getf(STRUCT_FGET(uap, fd))) == NULL)
 440  428                  return (EBADF);
 441  429  
 442  430          /*
 443  431           * Set read buffer size to rsize
 444  432           * and add room for RPC headers.
↓ open down ↓ 13 lines elided ↑ open up ↑
 458  446          addrmask.maxlen = STRUCT_FGET(uap, addrmask.maxlen);
 459  447          addrmask.buf = kmem_alloc(addrmask.maxlen, KM_SLEEP);
 460  448          error = copyin(STRUCT_FGETP(uap, addrmask.buf), addrmask.buf,
 461  449              addrmask.len);
 462  450          if (error) {
 463  451                  releasef(STRUCT_FGET(uap, fd));
 464  452                  kmem_free(addrmask.buf, addrmask.maxlen);
 465  453                  return (error);
 466  454          }
 467  455  
 468      -        nfs_versmin = STRUCT_FGET(uap, versmin);
 469      -        nfs_versmax = STRUCT_FGET(uap, versmax);
      456 +        ng->nfs_versmin = STRUCT_FGET(uap, versmin);
      457 +        ng->nfs_versmax = STRUCT_FGET(uap, versmax);
 470  458  
 471  459          /* Double check the vers min/max ranges */
 472      -        if ((nfs_versmin > nfs_versmax) ||
 473      -            (nfs_versmin < NFS_VERSMIN) ||
 474      -            (nfs_versmax > NFS_VERSMAX)) {
 475      -                nfs_versmin = NFS_VERSMIN_DEFAULT;
 476      -                nfs_versmax = NFS_VERSMAX_DEFAULT;
      460 +        if ((ng->nfs_versmin > ng->nfs_versmax) ||
      461 +            (ng->nfs_versmin < NFS_VERSMIN) ||
      462 +            (ng->nfs_versmax > NFS_VERSMAX)) {
      463 +                ng->nfs_versmin = NFS_VERSMIN_DEFAULT;
      464 +                ng->nfs_versmax = NFS_VERSMAX_DEFAULT;
 477  465          }
 478  466  
 479      -        if (error =
 480      -            nfs_srv_set_sc_versions(fp, &sctp, nfs_versmin, nfs_versmax)) {
      467 +        if (error = nfs_srv_set_sc_versions(fp, &sctp, ng->nfs_versmin,
      468 +            ng->nfs_versmax)) {
 481  469                  releasef(STRUCT_FGET(uap, fd));
 482  470                  kmem_free(addrmask.buf, addrmask.maxlen);
 483  471                  return (error);
 484  472          }
 485  473  
 486  474          /* Initialize nfsv4 server */
 487      -        if (nfs_versmax == (rpcvers_t)NFS_V4)
 488      -                rfs4_server_start(STRUCT_FGET(uap, delegation));
      475 +        if (ng->nfs_versmax == (rpcvers_t)NFS_V4)
      476 +                rfs4_server_start(ng, STRUCT_FGET(uap, delegation));
 489  477  
 490  478          /* Create a transport handle. */
 491  479          error = svc_tli_kcreate(fp, readsize, buf, &addrmask, &xprt,
 492  480              sctp, NULL, NFS_SVCPOOL_ID, TRUE);
 493  481  
 494  482          if (error)
 495  483                  kmem_free(addrmask.buf, addrmask.maxlen);
 496  484  
 497  485          releasef(STRUCT_FGET(uap, fd));
 498  486  
 499  487          /* HA-NFSv4: save the cluster nodeid */
 500  488          if (cluster_bootflags & CLUSTER_BOOTED)
 501  489                  lm_global_nlmid = clconf_get_nodeid();
 502  490  
 503  491          return (error);
 504  492  }
 505  493  
 506  494  static void
 507      -rfs4_server_start(int nfs4_srv_delegation)
      495 +rfs4_server_start(nfs_globals_t *ng, int nfs4_srv_delegation)
 508  496  {
 509  497          /*
 510  498           * Determine if the server has previously been "started" and
 511  499           * if not, do the per instance initialization
 512  500           */
 513      -        mutex_enter(&nfs_server_upordown_lock);
      501 +        mutex_enter(&ng->nfs_server_upordown_lock);
 514  502  
 515      -        if (nfs_server_upordown != NFS_SERVER_RUNNING) {
      503 +        if (ng->nfs_server_upordown != NFS_SERVER_RUNNING) {
 516  504                  /* Do we need to stop and wait on the previous server? */
 517      -                while (nfs_server_upordown == NFS_SERVER_STOPPING ||
 518      -                    nfs_server_upordown == NFS_SERVER_OFFLINE)
 519      -                        cv_wait(&nfs_server_upordown_cv,
 520      -                            &nfs_server_upordown_lock);
      505 +                while (ng->nfs_server_upordown == NFS_SERVER_STOPPING ||
      506 +                    ng->nfs_server_upordown == NFS_SERVER_OFFLINE)
      507 +                        cv_wait(&ng->nfs_server_upordown_cv,
      508 +                            &ng->nfs_server_upordown_lock);
 521  509  
 522      -                if (nfs_server_upordown != NFS_SERVER_RUNNING) {
      510 +                if (ng->nfs_server_upordown != NFS_SERVER_RUNNING) {
 523  511                          (void) svc_pool_control(NFS_SVCPOOL_ID,
 524  512                              SVCPSET_UNREGISTER_PROC, (void *)&nfs_srv_offline);
 525  513                          (void) svc_pool_control(NFS_SVCPOOL_ID,
 526  514                              SVCPSET_SHUTDOWN_PROC, (void *)&nfs_srv_stop_all);
 527  515  
 528      -                        /* is this an nfsd warm start? */
 529      -                        if (nfs_server_upordown == NFS_SERVER_QUIESCED) {
 530      -                                cmn_err(CE_NOTE, "nfs_server: "
 531      -                                    "server was previously quiesced; "
 532      -                                    "existing NFSv4 state will be re-used");
      516 +                        rfs4_do_server_start(ng->nfs_server_upordown,
      517 +                            nfs4_srv_delegation,
      518 +                            cluster_bootflags & CLUSTER_BOOTED);
 533  519  
 534      -                                /*
 535      -                                 * HA-NFSv4: this is also the signal
 536      -                                 * that a Resource Group failover has
 537      -                                 * occurred.
 538      -                                 */
 539      -                                if (cluster_bootflags & CLUSTER_BOOTED)
 540      -                                        hanfsv4_failover();
 541      -                        } else {
 542      -                                /* cold start */
 543      -                                rfs4_state_init();
 544      -                                nfs4_drc = rfs4_init_drc(nfs4_drc_max,
 545      -                                    nfs4_drc_hash);
 546      -                        }
 547      -
 548      -                        /*
 549      -                         * Check to see if delegation is to be
 550      -                         * enabled at the server
 551      -                         */
 552      -                        if (nfs4_srv_delegation != FALSE)
 553      -                                rfs4_set_deleg_policy(SRV_NORMAL_DELEGATE);
 554      -
 555      -                        nfs_server_upordown = NFS_SERVER_RUNNING;
      520 +                        ng->nfs_server_upordown = NFS_SERVER_RUNNING;
 556  521                  }
 557      -                cv_signal(&nfs_server_upordown_cv);
      522 +                cv_signal(&ng->nfs_server_upordown_cv);
 558  523          }
 559      -        mutex_exit(&nfs_server_upordown_lock);
      524 +        mutex_exit(&ng->nfs_server_upordown_lock);
 560  525  }
 561  526  
 562  527  /*
 563  528   * If RDMA device available,
 564  529   * start RDMA listener.
 565  530   */
 566  531  int
 567  532  rdma_start(struct rdma_svc_args *rsa)
 568  533  {
      534 +        nfs_globals_t *ng;
 569  535          int error;
 570  536          rdma_xprt_group_t started_rdma_xprts;
 571  537          rdma_stat stat;
 572  538          int svc_state = 0;
 573  539  
 574  540          /* Double check the vers min/max ranges */
 575  541          if ((rsa->nfs_versmin > rsa->nfs_versmax) ||
 576  542              (rsa->nfs_versmin < NFS_VERSMIN) ||
 577  543              (rsa->nfs_versmax > NFS_VERSMAX)) {
 578  544                  rsa->nfs_versmin = NFS_VERSMIN_DEFAULT;
 579  545                  rsa->nfs_versmax = NFS_VERSMAX_DEFAULT;
 580  546          }
 581      -        nfs_versmin = rsa->nfs_versmin;
 582      -        nfs_versmax = rsa->nfs_versmax;
 583  547  
      548 +        ng = zone_getspecific(nfssrv_zone_key, curzone);
      549 +        ng->nfs_versmin = rsa->nfs_versmin;
      550 +        ng->nfs_versmax = rsa->nfs_versmax;
      551 +
 584  552          /* Set the versions in the callout table */
 585  553          __nfs_sc_rdma[0].sc_versmin = rsa->nfs_versmin;
 586  554          __nfs_sc_rdma[0].sc_versmax = rsa->nfs_versmax;
 587  555          /* For the NFS_ACL program, check the max version */
 588  556          __nfs_sc_rdma[1].sc_versmin = rsa->nfs_versmin;
 589  557          if (rsa->nfs_versmax > NFS_ACL_VERSMAX)
 590  558                  __nfs_sc_rdma[1].sc_versmax = NFS_ACL_VERSMAX;
 591  559          else
 592  560                  __nfs_sc_rdma[1].sc_versmax = rsa->nfs_versmax;
 593  561  
 594  562          /* Initialize nfsv4 server */
 595  563          if (rsa->nfs_versmax == (rpcvers_t)NFS_V4)
 596      -                rfs4_server_start(rsa->delegation);
      564 +                rfs4_server_start(ng, rsa->delegation);
 597  565  
 598  566          started_rdma_xprts.rtg_count = 0;
 599  567          started_rdma_xprts.rtg_listhead = NULL;
 600  568          started_rdma_xprts.rtg_poolid = rsa->poolid;
 601  569  
 602  570  restart:
 603  571          error = svc_rdma_kcreate(rsa->netid, &nfs_sct_rdma, rsa->poolid,
 604  572              &started_rdma_xprts);
 605  573  
 606  574          svc_state = !error;
 607  575  
 608  576          while (!error) {
 609  577  
 610  578                  /*
 611  579                   * wait till either interrupted by a signal on
 612  580                   * nfs service stop/restart or signalled by a
 613      -                 * rdma plugin attach/detatch.
      581 +                 * rdma attach/detatch.
 614  582                   */
 615  583  
 616  584                  stat = rdma_kwait();
 617  585  
 618  586                  /*
 619  587                   * stop services if running -- either on a HCA detach event
 620  588                   * or if the nfs service is stopped/restarted.
 621  589                   */
 622  590  
 623  591                  if ((stat == RDMA_HCA_DETACH || stat == RDMA_INTR) &&
↓ open down ↓ 30 lines elided ↑ open up ↑
 654  622  rpc_null(caddr_t *argp, caddr_t *resp, struct exportinfo *exi,
 655  623      struct svc_req *req, cred_t *cr, bool_t ro)
 656  624  {
 657  625  }
 658  626  
 659  627  /* ARGSUSED */
 660  628  void
 661  629  rpc_null_v3(caddr_t *argp, caddr_t *resp, struct exportinfo *exi,
 662  630      struct svc_req *req, cred_t *cr, bool_t ro)
 663  631  {
 664      -        DTRACE_NFSV3_3(op__null__start, struct svc_req *, req,
 665      -            cred_t *, cr, vnode_t *, NULL);
 666      -        DTRACE_NFSV3_3(op__null__done, struct svc_req *, req,
 667      -            cred_t *, cr, vnode_t *, NULL);
      632 +        DTRACE_NFSV3_4(op__null__start, struct svc_req *, req,
      633 +            cred_t *, cr, vnode_t *, NULL, struct exportinfo *, exi);
      634 +        DTRACE_NFSV3_4(op__null__done, struct svc_req *, req,
      635 +            cred_t *, cr, vnode_t *, NULL, struct exportinfo *, exi);
 668  636  }
 669  637  
 670  638  /* ARGSUSED */
 671  639  static void
 672  640  rfs_error(caddr_t *argp, caddr_t *resp, struct exportinfo *exi,
 673  641      struct svc_req *req, cred_t *cr, bool_t ro)
 674  642  {
 675  643          /* return (EOPNOTSUPP); */
 676  644  }
 677  645  
↓ open down ↓ 657 lines elided ↑ open up ↑
1335 1303          /* RFS_NULL = 0 */
1336 1304  
1337 1305          /* RFS4_COMPOUND = 1 */
1338 1306          COMPOUND4res nfs4_compound_res;
1339 1307  
1340 1308  };
1341 1309  
1342 1310  static struct rpc_disptable rfs_disptable[] = {
1343 1311          {sizeof (rfsdisptab_v2) / sizeof (rfsdisptab_v2[0]),
1344 1312              rfscallnames_v2,
1345      -            &rfsproccnt_v2_ptr, rfsdisptab_v2},
     1313 +            &rfsproccnt_v2_ptr, &rfsprocio_v2_ptr, rfsdisptab_v2},
1346 1314          {sizeof (rfsdisptab_v3) / sizeof (rfsdisptab_v3[0]),
1347 1315              rfscallnames_v3,
1348      -            &rfsproccnt_v3_ptr, rfsdisptab_v3},
     1316 +            &rfsproccnt_v3_ptr, &rfsprocio_v3_ptr, rfsdisptab_v3},
1349 1317          {sizeof (rfsdisptab_v4) / sizeof (rfsdisptab_v4[0]),
1350 1318              rfscallnames_v4,
1351      -            &rfsproccnt_v4_ptr, rfsdisptab_v4},
     1319 +            &rfsproccnt_v4_ptr, &rfsprocio_v4_ptr, rfsdisptab_v4},
1352 1320  };
1353 1321  
1354 1322  /*
1355 1323   * If nfs_portmon is set, then clients are required to use privileged
1356 1324   * ports (ports < IPPORT_RESERVED) in order to get NFS services.
1357 1325   *
1358 1326   * N.B.: this attempt to carry forward the already ill-conceived notion
1359 1327   * of privileged ports for TCP/UDP is really quite ineffectual.  Not only
1360 1328   * is it transport-dependent, it's laughably easy to spoof.  If you're
1361 1329   * really interested in security, you must start with secure RPC instead.
1362 1330   */
1363      -static int nfs_portmon = 0;
     1331 +volatile int nfs_portmon = 0;
1364 1332  
1365 1333  #ifdef DEBUG
1366 1334  static int cred_hits = 0;
1367 1335  static int cred_misses = 0;
1368 1336  #endif
1369 1337  
1370      -
1371 1338  #ifdef DEBUG
1372 1339  /*
1373 1340   * Debug code to allow disabling of rfs_dispatch() use of
1374 1341   * fastxdrargs() and fastxdrres() calls for testing purposes.
1375 1342   */
1376 1343  static int rfs_no_fast_xdrargs = 0;
1377 1344  static int rfs_no_fast_xdrres = 0;
1378 1345  #endif
1379 1346  
1380 1347  union acl_args {
↓ open down ↓ 86 lines elided ↑ open up ↑
1467 1434                  LOOKUP3res *resp = (LOOKUP3res *)res;
1468 1435                  if ((enum wnfsstat)resp->status == WNFSERR_CLNT_FLAVOR)
1469 1436                          return (TRUE);
1470 1437          }
1471 1438          return (FALSE);
1472 1439  }
1473 1440  
1474 1441  
1475 1442  static void
1476 1443  common_dispatch(struct svc_req *req, SVCXPRT *xprt, rpcvers_t min_vers,
1477      -                rpcvers_t max_vers, char *pgmname,
1478      -                struct rpc_disptable *disptable)
     1444 +    rpcvers_t max_vers, char *pgmname, struct rpc_disptable *disptable)
1479 1445  {
1480 1446          int which;
1481 1447          rpcvers_t vers;
1482 1448          char *args;
1483 1449          union {
1484 1450                          union rfs_args ra;
1485 1451                          union acl_args aa;
1486 1452                  } args_buf;
1487 1453          char *res;
1488 1454          union {
↓ open down ↓ 12 lines elided ↑ open up ↑
1501 1467          int authres;
1502 1468          bool_t publicfh_ok = FALSE;
1503 1469          enum_t auth_flavor;
1504 1470          bool_t dupcached = FALSE;
1505 1471          struct netbuf   nb;
1506 1472          bool_t logging_enabled = FALSE;
1507 1473          struct exportinfo *nfslog_exi = NULL;
1508 1474          char **procnames;
1509 1475          char cbuf[INET6_ADDRSTRLEN];    /* to hold both IPv4 and IPv6 addr */
1510 1476          bool_t ro = FALSE;
     1477 +        kstat_t *ksp = NULL;
     1478 +        kstat_t *exi_ksp = NULL;
     1479 +        size_t pos;                     /* request size */
     1480 +        size_t rlen;                    /* reply size */
     1481 +        bool_t rsent = FALSE;           /* reply was sent successfully */
     1482 +        nfs_export_t *ne = nfs_get_export();
1511 1483  
1512 1484          vers = req->rq_vers;
1513 1485  
1514 1486          if (vers < min_vers || vers > max_vers) {
1515 1487                  svcerr_progvers(req->rq_xprt, min_vers, max_vers);
1516 1488                  error++;
1517 1489                  cmn_err(CE_NOTE, "%s: bad version number %u", pgmname, vers);
1518 1490                  goto done;
1519 1491          }
1520 1492          vers -= min_vers;
1521 1493  
1522 1494          which = req->rq_proc;
1523 1495          if (which < 0 || which >= disptable[(int)vers].dis_nprocs) {
1524 1496                  svcerr_noproc(req->rq_xprt);
1525 1497                  error++;
1526 1498                  goto done;
1527 1499          }
1528 1500  
1529 1501          (*(disptable[(int)vers].dis_proccntp))[which].value.ui64++;
1530 1502  
     1503 +        ksp = (*(disptable[(int)vers].dis_prociop))[which];
     1504 +        if (ksp != NULL) {
     1505 +                mutex_enter(ksp->ks_lock);
     1506 +                kstat_runq_enter(KSTAT_IO_PTR(ksp));
     1507 +                mutex_exit(ksp->ks_lock);
     1508 +        }
     1509 +        pos = XDR_GETPOS(&xprt->xp_xdrin);
     1510 +
1531 1511          disp = &disptable[(int)vers].dis_table[which];
1532 1512          procnames = disptable[(int)vers].dis_procnames;
1533 1513  
1534 1514          auth_flavor = req->rq_cred.oa_flavor;
1535 1515  
1536 1516          /*
1537 1517           * Deserialize into the args struct.
1538 1518           */
1539 1519          args = (char *)&args_buf;
1540 1520  
↓ open down ↓ 23 lines elided ↑ open up ↑
1564 1544                              pgmname, vers + min_vers, procnames[which],
1565 1545                              client_name(req), client_addr(req, cbuf));
1566 1546                          goto done;
1567 1547                  }
1568 1548          }
1569 1549  
1570 1550          /*
1571 1551           * If Version 4 use that specific dispatch function.
1572 1552           */
1573 1553          if (req->rq_vers == 4) {
1574      -                error += rfs4_dispatch(disp, req, xprt, args);
     1554 +                error += rfs4_dispatch(disp, req, xprt, args, &rlen);
     1555 +                if (error == 0)
     1556 +                        rsent = TRUE;
1575 1557                  goto done;
1576 1558          }
1577 1559  
1578 1560          dis_flags = disp->dis_flags;
1579 1561  
1580 1562          /*
1581 1563           * Find export information and check authentication,
1582 1564           * setting the credential if everything is ok.
1583 1565           */
1584 1566          if (disp->dis_getfh != NULL) {
↓ open down ↓ 40 lines elided ↑ open up ↑
1625 1607                   */
1626 1608  
1627 1609                  if ((dis_flags & RPC_ALLOWANON) && EQFID(fid, xfid))
1628 1610                          anon_ok = 1;
1629 1611                  else
1630 1612                          anon_ok = 0;
1631 1613  
1632 1614                  cr = xprt->xp_cred;
1633 1615                  ASSERT(cr != NULL);
1634 1616  #ifdef DEBUG
1635      -                if (crgetref(cr) != 1) {
1636      -                        crfree(cr);
1637      -                        cr = crget();
1638      -                        xprt->xp_cred = cr;
1639      -                        cred_misses++;
1640      -                } else
1641      -                        cred_hits++;
     1617 +                {
     1618 +                        if (crgetref(cr) != 1) {
     1619 +                                crfree(cr);
     1620 +                                cr = crget();
     1621 +                                xprt->xp_cred = cr;
     1622 +                                cred_misses++;
     1623 +                        } else
     1624 +                                cred_hits++;
     1625 +                }
1642 1626  #else
1643 1627                  if (crgetref(cr) != 1) {
1644 1628                          crfree(cr);
1645 1629                          cr = crget();
1646 1630                          xprt->xp_cred = cr;
1647 1631                  }
1648 1632  #endif
1649 1633  
1650 1634                  exi = checkexport(fsid, xfid);
1651 1635  
1652 1636                  if (exi != NULL) {
1653      -                        publicfh_ok = PUBLICFH_CHECK(disp, exi, fsid, xfid);
     1637 +                        rw_enter(&ne->exported_lock, RW_READER);
     1638 +                        exi_ksp = NULL;
1654 1639  
     1640 +                        if (exi->exi_kstats != NULL) {
     1641 +                                switch (req->rq_vers) {
     1642 +                                case NFS_VERSION:
     1643 +                                        exi_ksp = exp_kstats_v2(exi->exi_kstats,
     1644 +                                            which);
     1645 +                                        break;
     1646 +                                case NFS_V3:
     1647 +                                        exi_ksp = exp_kstats_v3(exi->exi_kstats,
     1648 +                                            which);
     1649 +                                        break;
     1650 +                                default:
     1651 +                                        ASSERT(0);
     1652 +                                        break;
     1653 +                                }
     1654 +                        }
     1655 +
     1656 +                        if (exi_ksp != NULL) {
     1657 +                                mutex_enter(exi_ksp->ks_lock);
     1658 +                                kstat_runq_enter(KSTAT_IO_PTR(exi_ksp));
     1659 +                                mutex_exit(exi_ksp->ks_lock);
     1660 +                        } else {
     1661 +                                rw_exit(&ne->exported_lock);
     1662 +                        }
     1663 +
     1664 +                        publicfh_ok = PUBLICFH_CHECK(ne, disp, exi, fsid, xfid);
1655 1665                          /*
1656 1666                           * Don't allow non-V4 clients access
1657 1667                           * to pseudo exports
1658 1668                           */
1659 1669                          if (PSEUDO(exi)) {
1660 1670                                  svcerr_weakauth(xprt);
1661 1671                                  error++;
1662 1672                                  goto done;
1663 1673                          }
1664 1674  
↓ open down ↓ 91 lines elided ↑ open up ↑
1756 1766          }
1757 1767  
1758 1768          /*
1759 1769           * Check to see if logging has been enabled on the server.
1760 1770           * If so, then obtain the export info struct to be used for
1761 1771           * the later writing of the log record.  This is done for
1762 1772           * the case that a lookup is done across a non-logged public
1763 1773           * file system.
1764 1774           */
1765 1775          if (nfslog_buffer_list != NULL) {
1766      -                nfslog_exi = nfslog_get_exi(exi, req, res, &nfslog_rec_id);
     1776 +                nfslog_exi = nfslog_get_exi(ne, exi, req, res, &nfslog_rec_id);
1767 1777                  /*
1768 1778                   * Is logging enabled?
1769 1779                   */
1770 1780                  logging_enabled = (nfslog_exi != NULL);
1771 1781  
1772 1782                  /*
1773 1783                   * Copy the netbuf for logging purposes, before it is
1774 1784                   * freed by svc_sendreply().
1775 1785                   */
1776 1786                  if (logging_enabled) {
↓ open down ↓ 16 lines elided ↑ open up ↑
1793 1803  #ifdef DEBUG
1794 1804          if (rfs_no_fast_xdrres == 0 && res != (char *)&res_buf)
1795 1805  #else
1796 1806          if (res != (char *)&res_buf)
1797 1807  #endif
1798 1808          {
1799 1809                  if (!svc_sendreply(xprt, disp->dis_fastxdrres, res)) {
1800 1810                          cmn_err(CE_NOTE, "%s: bad sendreply", pgmname);
1801 1811                          svcerr_systemerr(xprt);
1802 1812                          error++;
     1813 +                } else {
     1814 +                        rlen = xdr_sizeof(disp->dis_fastxdrres, res);
     1815 +                        rsent = TRUE;
1803 1816                  }
1804 1817          } else {
1805 1818                  if (!svc_sendreply(xprt, disp->dis_xdrres, res)) {
1806 1819                          cmn_err(CE_NOTE, "%s: bad sendreply", pgmname);
1807 1820                          svcerr_systemerr(xprt);
1808 1821                          error++;
     1822 +                } else {
     1823 +                        rlen = xdr_sizeof(disp->dis_xdrres, res);
     1824 +                        rsent = TRUE;
1809 1825                  }
1810 1826          }
1811 1827  
1812 1828          /*
1813 1829           * Log if needed
1814 1830           */
1815 1831          if (logging_enabled) {
1816 1832                  nfslog_write_record(nfslog_exi, req, args, (char *)&res_buf,
1817 1833                      cr, &nb, nfslog_rec_id, NFSLOG_ONE_BUFFER);
1818      -                exi_rele(nfslog_exi);
     1834 +                exi_rele(&nfslog_exi);
1819 1835                  kmem_free((&nb)->buf, (&nb)->len);
1820 1836          }
1821 1837  
1822 1838          /*
1823 1839           * Free results struct. With the addition of NFS V4 we can
1824 1840           * have non-idempotent procedures with functions.
1825 1841           */
1826 1842          if (disp->dis_resfree != nullfree && dupcached == FALSE) {
1827 1843                  (*disp->dis_resfree)(res);
1828 1844          }
1829 1845  
1830 1846  done:
     1847 +        if (ksp != NULL || exi_ksp != NULL) {
     1848 +                pos = XDR_GETPOS(&xprt->xp_xdrin) - pos;
     1849 +        }
     1850 +
1831 1851          /*
1832 1852           * Free arguments struct
1833 1853           */
1834 1854          if (disp) {
1835 1855                  if (!SVC_FREEARGS(xprt, disp->dis_xdrargs, args)) {
1836 1856                          cmn_err(CE_NOTE, "%s: bad freeargs", pgmname);
1837 1857                          error++;
1838 1858                  }
1839 1859          } else {
1840 1860                  if (!SVC_FREEARGS(xprt, (xdrproc_t)0, (caddr_t)0)) {
1841 1861                          cmn_err(CE_NOTE, "%s: bad freeargs", pgmname);
1842 1862                          error++;
1843 1863                  }
1844 1864          }
1845 1865  
     1866 +        if (exi_ksp != NULL) {
     1867 +                mutex_enter(exi_ksp->ks_lock);
     1868 +                KSTAT_IO_PTR(exi_ksp)->nwritten += pos;
     1869 +                KSTAT_IO_PTR(exi_ksp)->writes++;
     1870 +                if (rsent) {
     1871 +                        KSTAT_IO_PTR(exi_ksp)->nread += rlen;
     1872 +                        KSTAT_IO_PTR(exi_ksp)->reads++;
     1873 +                }
     1874 +                kstat_runq_exit(KSTAT_IO_PTR(exi_ksp));
     1875 +                mutex_exit(exi_ksp->ks_lock);
     1876 +
     1877 +                rw_exit(&ne->exported_lock);
     1878 +        }
     1879 +
1846 1880          if (exi != NULL)
1847      -                exi_rele(exi);
     1881 +                exi_rele(&exi);
1848 1882  
     1883 +        if (ksp != NULL) {
     1884 +                mutex_enter(ksp->ks_lock);
     1885 +                KSTAT_IO_PTR(ksp)->nwritten += pos;
     1886 +                KSTAT_IO_PTR(ksp)->writes++;
     1887 +                if (rsent) {
     1888 +                        KSTAT_IO_PTR(ksp)->nread += rlen;
     1889 +                        KSTAT_IO_PTR(ksp)->reads++;
     1890 +                }
     1891 +                kstat_runq_exit(KSTAT_IO_PTR(ksp));
     1892 +                mutex_exit(ksp->ks_lock);
     1893 +        }
     1894 +
1849 1895          global_svstat_ptr[req->rq_vers][NFS_BADCALLS].value.ui64 += error;
1850 1896  
1851 1897          global_svstat_ptr[req->rq_vers][NFS_CALLS].value.ui64++;
1852 1898  }
1853 1899  
1854 1900  static void
1855 1901  rfs_dispatch(struct svc_req *req, SVCXPRT *xprt)
1856 1902  {
1857 1903          common_dispatch(req, xprt, NFS_VERSMIN, NFS_VERSMAX,
1858 1904              "NFS", rfs_disptable);
↓ open down ↓ 105 lines elided ↑ open up ↑
1964 2010          {acl3_getxattrdir,
1965 2011              xdr_GETXATTRDIR3args, NULL_xdrproc_t, sizeof (GETXATTRDIR3args),
1966 2012              xdr_GETXATTRDIR3res, NULL_xdrproc_t, sizeof (GETXATTRDIR3res),
1967 2013              nullfree, RPC_IDEMPOTENT,
1968 2014              acl3_getxattrdir_getfh},
1969 2015  };
1970 2016  
1971 2017  static struct rpc_disptable acl_disptable[] = {
1972 2018          {sizeof (acldisptab_v2) / sizeof (acldisptab_v2[0]),
1973 2019                  aclcallnames_v2,
1974      -                &aclproccnt_v2_ptr, acldisptab_v2},
     2020 +                &aclproccnt_v2_ptr, &aclprocio_v2_ptr, acldisptab_v2},
1975 2021          {sizeof (acldisptab_v3) / sizeof (acldisptab_v3[0]),
1976 2022                  aclcallnames_v3,
1977      -                &aclproccnt_v3_ptr, acldisptab_v3},
     2023 +                &aclproccnt_v3_ptr, &aclprocio_v3_ptr, acldisptab_v3},
1978 2024  };
1979 2025  
1980 2026  static void
1981 2027  acl_dispatch(struct svc_req *req, SVCXPRT *xprt)
1982 2028  {
1983 2029          common_dispatch(req, xprt, NFS_ACL_VERSMIN, NFS_ACL_VERSMAX,
1984 2030              "ACL", acl_disptable);
1985 2031  }
1986 2032  
1987 2033  int
↓ open down ↓ 573 lines elided ↑ open up ↑
2561 2607          return (buf);
2562 2608  }
2563 2609  
2564 2610  /*
2565 2611   * NFS Server initialization routine.  This routine should only be called
2566 2612   * once.  It performs the following tasks:
2567 2613   *      - Call sub-initialization routines (localize access to variables)
2568 2614   *      - Initialize all locks
2569 2615   *      - initialize the version 3 write verifier
2570 2616   */
2571      -int
     2617 +void
2572 2618  nfs_srvinit(void)
2573 2619  {
2574      -        int error;
     2620 +        /* NFS server zone-specific global variables */
     2621 +        zone_key_create(&nfssrv_zone_key, nfs_srv_zone_init,
     2622 +            NULL, nfs_srv_zone_fini);
2575 2623  
2576      -        error = nfs_exportinit();
2577      -        if (error != 0)
2578      -                return (error);
2579      -        error = rfs4_srvrinit();
2580      -        if (error != 0) {
2581      -                nfs_exportfini();
2582      -                return (error);
2583      -        }
     2624 +        nfs_exportinit();
2584 2625          rfs_srvrinit();
2585 2626          rfs3_srvrinit();
     2627 +        rfs4_srvrinit();
2586 2628          nfsauth_init();
2587      -
2588      -        /* Init the stuff to control start/stop */
2589      -        nfs_server_upordown = NFS_SERVER_STOPPED;
2590      -        mutex_init(&nfs_server_upordown_lock, NULL, MUTEX_DEFAULT, NULL);
2591      -        cv_init(&nfs_server_upordown_cv, NULL, CV_DEFAULT, NULL);
2592      -        mutex_init(&rdma_wait_mutex, NULL, MUTEX_DEFAULT, NULL);
2593      -        cv_init(&rdma_wait_cv, NULL, CV_DEFAULT, NULL);
2594      -
2595      -        return (0);
2596 2629  }
2597 2630  
2598 2631  /*
2599 2632   * NFS Server finalization routine. This routine is called to cleanup the
2600 2633   * initialization work previously performed if the NFS server module could
2601 2634   * not be loaded correctly.
2602 2635   */
2603 2636  void
2604 2637  nfs_srvfini(void)
2605 2638  {
2606 2639          nfsauth_fini();
     2640 +        rfs4_srvrfini();
2607 2641          rfs3_srvrfini();
2608 2642          rfs_srvrfini();
2609 2643          nfs_exportfini();
2610 2644  
2611      -        mutex_destroy(&nfs_server_upordown_lock);
2612      -        cv_destroy(&nfs_server_upordown_cv);
2613      -        mutex_destroy(&rdma_wait_mutex);
2614      -        cv_destroy(&rdma_wait_cv);
     2645 +        (void) zone_key_delete(nfssrv_zone_key);
2615 2646  }
2616 2647  
     2648 +/* ARGSUSED */
     2649 +static void *
     2650 +nfs_srv_zone_init(zoneid_t zoneid)
     2651 +{
     2652 +        nfs_globals_t *ng;
     2653 +
     2654 +        ng = kmem_zalloc(sizeof (*ng), KM_SLEEP);
     2655 +
     2656 +        ng->nfs_versmin = NFS_VERSMIN_DEFAULT;
     2657 +        ng->nfs_versmax = NFS_VERSMAX_DEFAULT;
     2658 +
     2659 +        /* Init the stuff to control start/stop */
     2660 +        ng->nfs_server_upordown = NFS_SERVER_STOPPED;
     2661 +        mutex_init(&ng->nfs_server_upordown_lock, NULL, MUTEX_DEFAULT, NULL);
     2662 +        cv_init(&ng->nfs_server_upordown_cv, NULL, CV_DEFAULT, NULL);
     2663 +        mutex_init(&ng->rdma_wait_mutex, NULL, MUTEX_DEFAULT, NULL);
     2664 +        cv_init(&ng->rdma_wait_cv, NULL, CV_DEFAULT, NULL);
     2665 +
     2666 +        return (ng);
     2667 +}
     2668 +
     2669 +/* ARGSUSED */
     2670 +static void
     2671 +nfs_srv_zone_fini(zoneid_t zoneid, void *data)
     2672 +{
     2673 +        nfs_globals_t *ng;
     2674 +
     2675 +        ng = (nfs_globals_t *)data;
     2676 +        mutex_destroy(&ng->nfs_server_upordown_lock);
     2677 +        cv_destroy(&ng->nfs_server_upordown_cv);
     2678 +        mutex_destroy(&ng->rdma_wait_mutex);
     2679 +        cv_destroy(&ng->rdma_wait_cv);
     2680 +
     2681 +        kmem_free(ng, sizeof (*ng));
     2682 +}
     2683 +
2617 2684  /*
2618 2685   * Set up an iovec array of up to cnt pointers.
2619 2686   */
2620      -
2621 2687  void
2622 2688  mblk_to_iov(mblk_t *m, int cnt, struct iovec *iovp)
2623 2689  {
2624 2690          while (m != NULL && cnt-- > 0) {
2625 2691                  iovp->iov_base = (caddr_t)m->b_rptr;
2626 2692                  iovp->iov_len = (m->b_wptr - m->b_rptr);
2627 2693                  iovp++;
2628 2694                  m = m->b_cont;
2629 2695          }
2630 2696  }
↓ open down ↓ 216 lines elided ↑ open up ↑
2847 2913                                  goto publicfh_done;
2848 2914  
2849 2915                          /*
2850 2916                           * Found a valid vp for index "filename". Sanity check
2851 2917                           * for odd case where a directory is provided as index
2852 2918                           * option argument and leads us to another filesystem
2853 2919                           */
2854 2920  
2855 2921                          /* Release the reference on the old exi value */
2856 2922                          ASSERT(*exi != NULL);
2857      -                        exi_rele(*exi);
     2923 +                        exi_rele(exi);
2858 2924  
2859 2925                          if (error = nfs_check_vpexi(mc_dvp, *vpp, kcred, exi)) {
2860 2926                                  VN_RELE(*vpp);
2861 2927                                  goto publicfh_done;
2862 2928                          }
2863 2929                  }
2864 2930          }
2865 2931  
2866 2932  publicfh_done:
2867 2933          if (mc_dvp)
↓ open down ↓ 18 lines elided ↑ open up ↑
2886 2952          struct pathname pn;
2887 2953          int error;
2888 2954  
2889 2955          /*
2890 2956           * If pathname starts with '/', then set startdvp to root.
2891 2957           */
2892 2958          if (*path == '/') {
2893 2959                  while (*path == '/')
2894 2960                          path++;
2895 2961  
2896      -                startdvp = rootdir;
     2962 +                startdvp = ZONE_ROOTVP();
2897 2963          }
2898 2964  
2899 2965          error = pn_get_buf(path, UIO_SYSSPACE, &pn, namebuf, sizeof (namebuf));
2900 2966          if (error == 0) {
2901 2967                  /*
2902 2968                   * Call the URL parser for URL paths to modify the original
2903 2969                   * string to handle any '%' encoded characters that exist.
2904 2970                   * Done here to avoid an extra bcopy in the lookup.
2905 2971                   * We need to be careful about pathlen's. We know that
2906 2972                   * rfs_pathname() is called with a non-empty path. However,
↓ open down ↓ 2 lines elided ↑ open up ↑
2909 2975                   * URL parser finding an encoded null character at the
2910 2976                   * beginning of path which should not proceed with the lookup.
2911 2977                   */
2912 2978                  if (pn.pn_pathlen != 0 && pathflag == URLPATH) {
2913 2979                          URLparse(pn.pn_path);
2914 2980                          if ((pn.pn_pathlen = strlen(pn.pn_path)) == 0)
2915 2981                                  return (ENOENT);
2916 2982                  }
2917 2983                  VN_HOLD(startdvp);
2918 2984                  error = lookuppnvp(&pn, NULL, NO_FOLLOW, dirvpp, compvpp,
2919      -                    rootdir, startdvp, cr);
     2985 +                    ZONE_ROOTVP(), startdvp, cr);
2920 2986          }
2921 2987          if (error == ENAMETOOLONG) {
2922 2988                  /*
2923 2989                   * This thread used a pathname > TYPICALMAXPATHLEN bytes long.
2924 2990                   */
2925 2991                  if (error = pn_get(path, UIO_SYSSPACE, &pn))
2926 2992                          return (error);
2927 2993                  if (pn.pn_pathlen != 0 && pathflag == URLPATH) {
2928 2994                          URLparse(pn.pn_path);
2929 2995                          if ((pn.pn_pathlen = strlen(pn.pn_path)) == 0) {
2930 2996                                  pn_free(&pn);
2931 2997                                  return (ENOENT);
2932 2998                          }
2933 2999                  }
2934 3000                  VN_HOLD(startdvp);
2935 3001                  error = lookuppnvp(&pn, NULL, NO_FOLLOW, dirvpp, compvpp,
2936      -                    rootdir, startdvp, cr);
     3002 +                    ZONE_ROOTVP(), startdvp, cr);
2937 3003                  pn_free(&pn);
2938 3004          }
2939 3005  
2940 3006          return (error);
2941 3007  }
2942 3008  
2943 3009  /*
2944 3010   * Adapt the multicomponent lookup path depending on the pathtype
2945 3011   */
2946 3012  static int
↓ open down ↓ 83 lines elided ↑ open up ↑
3030 3096                   * must not terminate below the
3031 3097                   * exported directory.
3032 3098                   */
3033 3099                  if ((*exi)->exi_export.ex_flags & EX_NOSUB && walk > 0)
3034 3100                          error = EACCES;
3035 3101          }
3036 3102  
3037 3103          return (error);
3038 3104  }
3039 3105  
3040      -/*
3041      - * Do the main work of handling HA-NFSv4 Resource Group failover on
3042      - * Sun Cluster.
3043      - * We need to detect whether any RG admin paths have been added or removed,
3044      - * and adjust resources accordingly.
3045      - * Currently we're using a very inefficient algorithm, ~ 2 * O(n**2). In
3046      - * order to scale, the list and array of paths need to be held in more
3047      - * suitable data structures.
3048      - */
3049      -static void
3050      -hanfsv4_failover(void)
3051      -{
3052      -        int i, start_grace, numadded_paths = 0;
3053      -        char **added_paths = NULL;
3054      -        rfs4_dss_path_t *dss_path;
3055      -
3056      -        /*
3057      -         * Note: currently, rfs4_dss_pathlist cannot be NULL, since
3058      -         * it will always include an entry for NFS4_DSS_VAR_DIR. If we
3059      -         * make the latter dynamically specified too, the following will
3060      -         * need to be adjusted.
3061      -         */
3062      -
3063      -        /*
3064      -         * First, look for removed paths: RGs that have been failed-over
3065      -         * away from this node.
3066      -         * Walk the "currently-serving" rfs4_dss_pathlist and, for each
3067      -         * path, check if it is on the "passed-in" rfs4_dss_newpaths array
3068      -         * from nfsd. If not, that RG path has been removed.
3069      -         *
3070      -         * Note that nfsd has sorted rfs4_dss_newpaths for us, and removed
3071      -         * any duplicates.
3072      -         */
3073      -        dss_path = rfs4_dss_pathlist;
3074      -        do {
3075      -                int found = 0;
3076      -                char *path = dss_path->path;
3077      -
3078      -                /* used only for non-HA so may not be removed */
3079      -                if (strcmp(path, NFS4_DSS_VAR_DIR) == 0) {
3080      -                        dss_path = dss_path->next;
3081      -                        continue;
3082      -                }
3083      -
3084      -                for (i = 0; i < rfs4_dss_numnewpaths; i++) {
3085      -                        int cmpret;
3086      -                        char *newpath = rfs4_dss_newpaths[i];
3087      -
3088      -                        /*
3089      -                         * Since nfsd has sorted rfs4_dss_newpaths for us,
3090      -                         * once the return from strcmp is negative we know
3091      -                         * we've passed the point where "path" should be,
3092      -                         * and can stop searching: "path" has been removed.
3093      -                         */
3094      -                        cmpret = strcmp(path, newpath);
3095      -                        if (cmpret < 0)
3096      -                                break;
3097      -                        if (cmpret == 0) {
3098      -                                found = 1;
3099      -                                break;
3100      -                        }
3101      -                }
3102      -
3103      -                if (found == 0) {
3104      -                        unsigned index = dss_path->index;
3105      -                        rfs4_servinst_t *sip = dss_path->sip;
3106      -                        rfs4_dss_path_t *path_next = dss_path->next;
3107      -
3108      -                        /*
3109      -                         * This path has been removed.
3110      -                         * We must clear out the servinst reference to
3111      -                         * it, since it's now owned by another
3112      -                         * node: we should not attempt to touch it.
3113      -                         */
3114      -                        ASSERT(dss_path == sip->dss_paths[index]);
3115      -                        sip->dss_paths[index] = NULL;
3116      -
3117      -                        /* remove from "currently-serving" list, and destroy */
3118      -                        remque(dss_path);
3119      -                        /* allow for NUL */
3120      -                        kmem_free(dss_path->path, strlen(dss_path->path) + 1);
3121      -                        kmem_free(dss_path, sizeof (rfs4_dss_path_t));
3122      -
3123      -                        dss_path = path_next;
3124      -                } else {
3125      -                        /* path was found; not removed */
3126      -                        dss_path = dss_path->next;
3127      -                }
3128      -        } while (dss_path != rfs4_dss_pathlist);
3129      -
3130      -        /*
3131      -         * Now, look for added paths: RGs that have been failed-over
3132      -         * to this node.
3133      -         * Walk the "passed-in" rfs4_dss_newpaths array from nfsd and,
3134      -         * for each path, check if it is on the "currently-serving"
3135      -         * rfs4_dss_pathlist. If not, that RG path has been added.
3136      -         *
3137      -         * Note: we don't do duplicate detection here; nfsd does that for us.
3138      -         *
3139      -         * Note: numadded_paths <= rfs4_dss_numnewpaths, which gives us
3140      -         * an upper bound for the size needed for added_paths[numadded_paths].
3141      -         */
3142      -
3143      -        /* probably more space than we need, but guaranteed to be enough */
3144      -        if (rfs4_dss_numnewpaths > 0) {
3145      -                size_t sz = rfs4_dss_numnewpaths * sizeof (char *);
3146      -                added_paths = kmem_zalloc(sz, KM_SLEEP);
3147      -        }
3148      -
3149      -        /* walk the "passed-in" rfs4_dss_newpaths array from nfsd */
3150      -        for (i = 0; i < rfs4_dss_numnewpaths; i++) {
3151      -                int found = 0;
3152      -                char *newpath = rfs4_dss_newpaths[i];
3153      -
3154      -                dss_path = rfs4_dss_pathlist;
3155      -                do {
3156      -                        char *path = dss_path->path;
3157      -
3158      -                        /* used only for non-HA */
3159      -                        if (strcmp(path, NFS4_DSS_VAR_DIR) == 0) {
3160      -                                dss_path = dss_path->next;
3161      -                                continue;
3162      -                        }
3163      -
3164      -                        if (strncmp(path, newpath, strlen(path)) == 0) {
3165      -                                found = 1;
3166      -                                break;
3167      -                        }
3168      -
3169      -                        dss_path = dss_path->next;
3170      -                } while (dss_path != rfs4_dss_pathlist);
3171      -
3172      -                if (found == 0) {
3173      -                        added_paths[numadded_paths] = newpath;
3174      -                        numadded_paths++;
3175      -                }
3176      -        }
3177      -
3178      -        /* did we find any added paths? */
3179      -        if (numadded_paths > 0) {
3180      -                /* create a new server instance, and start its grace period */
3181      -                start_grace = 1;
3182      -                rfs4_servinst_create(start_grace, numadded_paths, added_paths);
3183      -
3184      -                /* read in the stable storage state from these paths */
3185      -                rfs4_dss_readstate(numadded_paths, added_paths);
3186      -
3187      -                /*
3188      -                 * Multiple failovers during a grace period will cause
3189      -                 * clients of the same resource group to be partitioned
3190      -                 * into different server instances, with different
3191      -                 * grace periods.  Since clients of the same resource
3192      -                 * group must be subject to the same grace period,
3193      -                 * we need to reset all currently active grace periods.
3194      -                 */
3195      -                rfs4_grace_reset_all();
3196      -        }
3197      -
3198      -        if (rfs4_dss_numnewpaths > 0)
3199      -                kmem_free(added_paths, rfs4_dss_numnewpaths * sizeof (char *));
3200      -}
3201      -
3202 3106  /*
3203 3107   * Used by NFSv3 and NFSv4 server to query label of
3204 3108   * a pathname component during lookup/access ops.
3205 3109   */
3206 3110  ts_label_t *
3207 3111  nfs_getflabel(vnode_t *vp, struct exportinfo *exi)
3208 3112  {
3209 3113          zone_t *zone;
3210 3114          ts_label_t *zone_label;
3211 3115          char *path;
↓ open down ↓ 260 lines elided ↑ open up ↑
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX