big-one Wdiff usr/src/uts/common/fs/zfs/zfs_vfsops.c

Print this page

NEX-19394 backport 9337 zfs get all is slow due to uncached metadata
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
 Conflicts:
  usr/src/uts/common/fs/zfs/dbuf.c
  usr/src/uts/common/fs/zfs/dmu.c
  usr/src/uts/common/fs/zfs/sys/dmu_objset.h
NEX-9200 Improve the scalability of attribute locking in zfs_zget
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-9436 Rate limiting controls (was QoS) per ZFS dataset, updates from demo
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
NEX-8972 Async-delete side-effect that may cause unmount EBUSY
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-8852 Quality-of-Service (QoS) controls per NFS share
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5085 implement async delete for large files
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-3762 Appliance crashes with a NULL pointer dereference during a zpool export when a zfs_vn_rele_taskq thread attempts to check a bogus rwlock from rw_write_held
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
6160 /usr/lib/fs/zfs/bootinstall should use bootadm
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Adam Števko <adam.stevko@gmail.com>
Reviewed by: Josef Sipek <jeffpc@josefsipek.net>
Approved by: Richard Lowe <richlowe@richlowe.net>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (NULL is not an int)
6171 dsl_prop_unregister() slows down dataset eviction.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
NEX-3485 Deferred deletes causing loss of service for NFS clients on cluster failover
Reviewed by: Marcel Telka <marcel.telka@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
NEX-2965 4.0.3-FP2: deferred deletes causing RSF import failure during fail-over of service
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
re #13253 rb4328 ssh: openssl version checking needs updating
re #11441 rb4292 panic in apic_record_rdt_entry on VMware hardware version 9
re #12619, rb4287 Deadlocked zfs txg processing in dsl_sync_task_group_sync()
re #13204 rb4280 zfs receive/rollback deadlock
re #6815 rb1758 need WORM in nza-kernel (4.0)

Split	Close
Expand all
Collapse all

          --- old/usr/src/uts/common/fs/zfs/zfs_vfsops.c
          +++ new/usr/src/uts/common/fs/zfs/zfs_vfsops.c

   1    1  /*
   2    2   * CDDL HEADER START
   3    3   *
   4    4   * The contents of this file are subject to the terms of the
   5    5   * Common Development and Distribution License (the "License").
   6    6   * You may not use this file except in compliance with the License.
   7    7   *
   8    8   * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
   9    9   * or http://www.opensolaris.org/os/licensing.
  10   10   * See the License for the specific language governing permissions
  11   11   * and limitations under the License.
  12   12   *
  13   13   * When distributing Covered Code, include this CDDL HEADER in each
  14   14   * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  15   15   * If applicable, add the following below this CDDL HEADER, with the
  16   16   * fields enclosed by brackets "[]" replaced with your own identifying
  17   17   * information: Portions Copyright [yyyy] [name of copyright owner]
  18   18   *
  19   19   * CDDL HEADER END
  20   20   */
  21   21  /*
  22   22   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  23   23   * Copyright (c) 2012, 2015 by Delphix. All rights reserved.
  24   24   * Copyright (c) 2014 Integros [integros.com]
  25   25   * Copyright 2016 Nexenta Systems, Inc. All rights reserved.
  26   26   */
  27   27  
  28   28  /* Portions Copyright 2010 Robert Milkowski */
  29   29  
  30   30  #include <sys/types.h>
  31   31  #include <sys/param.h>
  32   32  #include <sys/systm.h>
  33   33  #include <sys/sysmacros.h>
  34   34  #include <sys/kmem.h>
  35   35  #include <sys/pathname.h>
  36   36  #include <sys/vnode.h>
  37   37  #include <sys/vfs.h>

↓ open down ↓

37 lines elided

↑ open up ↑

  38   38  #include <sys/vfs_opreg.h>
  39   39  #include <sys/mntent.h>
  40   40  #include <sys/mount.h>
  41   41  #include <sys/cmn_err.h>
  42   42  #include "fs/fs_subr.h"
  43   43  #include <sys/zfs_znode.h>
  44   44  #include <sys/zfs_dir.h>
  45   45  #include <sys/zil.h>
  46   46  #include <sys/fs/zfs.h>
  47   47  #include <sys/dmu.h>
       48 +#include <sys/dsl_dir.h>
  48   49  #include <sys/dsl_prop.h>
  49   50  #include <sys/dsl_dataset.h>
  50   51  #include <sys/dsl_deleg.h>
  51   52  #include <sys/spa.h>
  52   53  #include <sys/zap.h>
  53   54  #include <sys/sa.h>
  54   55  #include <sys/sa_impl.h>
  55   56  #include <sys/varargs.h>
  56   57  #include <sys/policy.h>
  57   58  #include <sys/atomic.h>

  58   59  #include <sys/mkdev.h>
  59   60  #include <sys/modctl.h>
  60   61  #include <sys/refstr.h>
  61   62  #include <sys/zfs_ioctl.h>
  62   63  #include <sys/zfs_ctldir.h>
  63   64  #include <sys/zfs_fuid.h>
  64   65  #include <sys/bootconf.h>
  65   66  #include <sys/sunddi.h>
  66   67  #include <sys/dnlc.h>
  67   68  #include <sys/dmu_objset.h>
  68   69  #include <sys/spa_boot.h>
  69   70  #include "zfs_comutil.h"
  70   71  
  71   72  int zfsfstype;
  72   73  vfsops_t *zfs_vfsops = NULL;
  73   74  static major_t zfs_major;
  74   75  static minor_t zfs_minor;
  75   76  static kmutex_t zfs_dev_mtx;
  76   77  
  77   78  extern int sys_shutdown;
  78   79  
  79   80  static int zfs_mount(vfs_t *vfsp, vnode_t *mvp, struct mounta *uap, cred_t *cr);
  80   81  static int zfs_umount(vfs_t *vfsp, int fflag, cred_t *cr);
  81   82  static int zfs_mountroot(vfs_t *vfsp, enum whymountroot);
  82   83  static int zfs_root(vfs_t *vfsp, vnode_t **vpp);
  83   84  static int zfs_statvfs(vfs_t *vfsp, struct statvfs64 *statp);
  84   85  static int zfs_vget(vfs_t *vfsp, vnode_t **vpp, fid_t *fidp);
  85   86  static void zfs_freevfs(vfs_t *vfsp);
  86   87  
  87   88  static const fs_operation_def_t zfs_vfsops_template[] = {
  88   89          VFSNAME_MOUNT,          { .vfs_mount = zfs_mount },
  89   90          VFSNAME_MOUNTROOT,      { .vfs_mountroot = zfs_mountroot },
  90   91          VFSNAME_UNMOUNT,        { .vfs_unmount = zfs_umount },
  91   92          VFSNAME_ROOT,           { .vfs_root = zfs_root },
  92   93          VFSNAME_STATVFS,        { .vfs_statvfs = zfs_statvfs },
  93   94          VFSNAME_SYNC,           { .vfs_sync = zfs_sync },
  94   95          VFSNAME_VGET,           { .vfs_vget = zfs_vget },
  95   96          VFSNAME_FREEVFS,        { .vfs_freevfs = zfs_freevfs },
  96   97          NULL,                   NULL
  97   98  };
  98   99  
  99  100  /*
 100  101   * We need to keep a count of active fs's.
 101  102   * This is necessary to prevent our module
 102  103   * from being unloaded after a umount -f
 103  104   */
 104  105  static uint32_t zfs_active_fs_count = 0;
 105  106  
 106  107  static char *noatime_cancel[] = { MNTOPT_ATIME, NULL };
 107  108  static char *atime_cancel[] = { MNTOPT_NOATIME, NULL };
 108  109  static char *noxattr_cancel[] = { MNTOPT_XATTR, NULL };
 109  110  static char *xattr_cancel[] = { MNTOPT_NOXATTR, NULL };
 110  111  
 111  112  /*
 112  113   * MO_DEFAULT is not used since the default value is determined
 113  114   * by the equivalent property.
 114  115   */
 115  116  static mntopt_t mntopts[] = {
 116  117          { MNTOPT_NOXATTR, noxattr_cancel, NULL, 0, NULL },
 117  118          { MNTOPT_XATTR, xattr_cancel, NULL, 0, NULL },
 118  119          { MNTOPT_NOATIME, noatime_cancel, NULL, 0, NULL },
 119  120          { MNTOPT_ATIME, atime_cancel, NULL, 0, NULL }
 120  121  };
 121  122  
 122  123  static mntopts_t zfs_mntopts = {
 123  124          sizeof (mntopts) / sizeof (mntopt_t),
 124  125          mntopts
 125  126  };
 126  127  
 127  128  /*ARGSUSED*/
 128  129  int
 129  130  zfs_sync(vfs_t *vfsp, short flag, cred_t *cr)
 130  131  {
 131  132          /*
 132  133           * Data integrity is job one.  We don't want a compromised kernel
 133  134           * writing to the storage pool, so we never sync during panic.
 134  135           */
 135  136          if (panicstr)
 136  137                  return (0);
 137  138  
 138  139          /*
 139  140           * SYNC_ATTR is used by fsflush() to force old filesystems like UFS
 140  141           * to sync metadata, which they would otherwise cache indefinitely.
 141  142           * Semantically, the only requirement is that the sync be initiated.
 142  143           * The DMU syncs out txgs frequently, so there's nothing to do.
 143  144           */
 144  145          if (flag & SYNC_ATTR)
 145  146                  return (0);
 146  147  
 147  148          if (vfsp != NULL) {
 148  149                  /*
 149  150                   * Sync a specific filesystem.
 150  151                   */
 151  152                  zfsvfs_t *zfsvfs = vfsp->vfs_data;
 152  153                  dsl_pool_t *dp;
 153  154  
 154  155                  ZFS_ENTER(zfsvfs);
 155  156                  dp = dmu_objset_pool(zfsvfs->z_os);
 156  157  
 157  158                  /*
 158  159                   * If the system is shutting down, then skip any
 159  160                   * filesystems which may exist on a suspended pool.
 160  161                   */
 161  162                  if (sys_shutdown && spa_suspended(dp->dp_spa)) {
 162  163                          ZFS_EXIT(zfsvfs);
 163  164                          return (0);
 164  165                  }
 165  166  
 166  167                  if (zfsvfs->z_log != NULL)
 167  168                          zil_commit(zfsvfs->z_log, 0);
 168  169  
 169  170                  ZFS_EXIT(zfsvfs);
 170  171          } else {
 171  172                  /*
 172  173                   * Sync all ZFS filesystems.  This is what happens when you
 173  174                   * run sync(1M).  Unlike other filesystems, ZFS honors the
 174  175                   * request by waiting for all pools to commit all dirty data.
 175  176                   */
 176  177                  spa_sync_allpools();
 177  178          }
 178  179  
 179  180          return (0);
 180  181  }
 181  182  
 182  183  static int
 183  184  zfs_create_unique_device(dev_t *dev)
 184  185  {
 185  186          major_t new_major;
 186  187  
 187  188          do {
 188  189                  ASSERT3U(zfs_minor, <=, MAXMIN32);
 189  190                  minor_t start = zfs_minor;
 190  191                  do {
 191  192                          mutex_enter(&zfs_dev_mtx);
 192  193                          if (zfs_minor >= MAXMIN32) {
 193  194                                  /*
 194  195                                   * If we're still using the real major
 195  196                                   * keep out of /dev/zfs and /dev/zvol minor
 196  197                                   * number space.  If we're using a getudev()'ed
 197  198                                   * major number, we can use all of its minors.
 198  199                                   */
 199  200                                  if (zfs_major == ddi_name_to_major(ZFS_DRIVER))
 200  201                                          zfs_minor = ZFS_MIN_MINOR;
 201  202                                  else
 202  203                                          zfs_minor = 0;
 203  204                          } else {
 204  205                                  zfs_minor++;
 205  206                          }
 206  207                          *dev = makedevice(zfs_major, zfs_minor);
 207  208                          mutex_exit(&zfs_dev_mtx);
 208  209                  } while (vfs_devismounted(*dev) && zfs_minor != start);
 209  210                  if (zfs_minor == start) {
 210  211                          /*
 211  212                           * We are using all ~262,000 minor numbers for the
 212  213                           * current major number.  Create a new major number.
 213  214                           */
 214  215                          if ((new_major = getudev()) == (major_t)-1) {
 215  216                                  cmn_err(CE_WARN,
 216  217                                      "zfs_mount: Can't get unique major "
 217  218                                      "device number.");
 218  219                                  return (-1);
 219  220                          }
 220  221                          mutex_enter(&zfs_dev_mtx);
 221  222                          zfs_major = new_major;
 222  223                          zfs_minor = 0;
 223  224  
 224  225                          mutex_exit(&zfs_dev_mtx);
 225  226                  } else {
 226  227                          break;
 227  228                  }
 228  229                  /* CONSTANTCONDITION */
 229  230          } while (1);
 230  231  
 231  232          return (0);
 232  233  }
 233  234  
 234  235  static void
 235  236  atime_changed_cb(void *arg, uint64_t newval)
 236  237  {
 237  238          zfsvfs_t *zfsvfs = arg;
 238  239  
 239  240          if (newval == TRUE) {
 240  241                  zfsvfs->z_atime = TRUE;
 241  242                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOATIME);
 242  243                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_ATIME, NULL, 0);
 243  244          } else {
 244  245                  zfsvfs->z_atime = FALSE;
 245  246                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_ATIME);
 246  247                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOATIME, NULL, 0);
 247  248          }
 248  249  }
 249  250  
 250  251  static void
 251  252  xattr_changed_cb(void *arg, uint64_t newval)
 252  253  {
 253  254          zfsvfs_t *zfsvfs = arg;
 254  255  
 255  256          if (newval == TRUE) {
 256  257                  /* XXX locking on vfs_flag? */
 257  258                  zfsvfs->z_vfs->vfs_flag |= VFS_XATTR;
 258  259                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOXATTR);
 259  260                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_XATTR, NULL, 0);
 260  261          } else {
 261  262                  /* XXX locking on vfs_flag? */
 262  263                  zfsvfs->z_vfs->vfs_flag &= ~VFS_XATTR;
 263  264                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_XATTR);
 264  265                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOXATTR, NULL, 0);
 265  266          }
 266  267  }
 267  268  
 268  269  static void
 269  270  blksz_changed_cb(void *arg, uint64_t newval)
 270  271  {
 271  272          zfsvfs_t *zfsvfs = arg;
 272  273          ASSERT3U(newval, <=, spa_maxblocksize(dmu_objset_spa(zfsvfs->z_os)));
 273  274          ASSERT3U(newval, >=, SPA_MINBLOCKSIZE);
 274  275          ASSERT(ISP2(newval));
 275  276  
 276  277          zfsvfs->z_max_blksz = newval;
 277  278          zfsvfs->z_vfs->vfs_bsize = newval;
 278  279  }
 279  280  
 280  281  static void
 281  282  readonly_changed_cb(void *arg, uint64_t newval)
 282  283  {
 283  284          zfsvfs_t *zfsvfs = arg;
 284  285  
 285  286          if (newval) {
 286  287                  /* XXX locking on vfs_flag? */
 287  288                  zfsvfs->z_vfs->vfs_flag |= VFS_RDONLY;
 288  289                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_RW);
 289  290                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_RO, NULL, 0);
 290  291          } else {
 291  292                  /* XXX locking on vfs_flag? */
 292  293                  zfsvfs->z_vfs->vfs_flag &= ~VFS_RDONLY;
 293  294                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_RO);
 294  295                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_RW, NULL, 0);
 295  296          }
 296  297  }
 297  298  
 298  299  static void
 299  300  devices_changed_cb(void *arg, uint64_t newval)
 300  301  {
 301  302          zfsvfs_t *zfsvfs = arg;
 302  303  
 303  304          if (newval == FALSE) {
 304  305                  zfsvfs->z_vfs->vfs_flag |= VFS_NODEVICES;
 305  306                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_DEVICES);
 306  307                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NODEVICES, NULL, 0);
 307  308          } else {
 308  309                  zfsvfs->z_vfs->vfs_flag &= ~VFS_NODEVICES;
 309  310                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NODEVICES);
 310  311                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_DEVICES, NULL, 0);
 311  312          }
 312  313  }
 313  314  
 314  315  static void
 315  316  setuid_changed_cb(void *arg, uint64_t newval)
 316  317  {
 317  318          zfsvfs_t *zfsvfs = arg;
 318  319  
 319  320          if (newval == FALSE) {
 320  321                  zfsvfs->z_vfs->vfs_flag |= VFS_NOSETUID;
 321  322                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_SETUID);
 322  323                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOSETUID, NULL, 0);
 323  324          } else {
 324  325                  zfsvfs->z_vfs->vfs_flag &= ~VFS_NOSETUID;
 325  326                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOSETUID);
 326  327                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_SETUID, NULL, 0);
 327  328          }
 328  329  }
 329  330  
 330  331  static void
 331  332  exec_changed_cb(void *arg, uint64_t newval)
 332  333  {
 333  334          zfsvfs_t *zfsvfs = arg;
 334  335  
 335  336          if (newval == FALSE) {
 336  337                  zfsvfs->z_vfs->vfs_flag |= VFS_NOEXEC;
 337  338                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_EXEC);
 338  339                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOEXEC, NULL, 0);
 339  340          } else {
 340  341                  zfsvfs->z_vfs->vfs_flag &= ~VFS_NOEXEC;
 341  342                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOEXEC);
 342  343                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_EXEC, NULL, 0);
 343  344          }
 344  345  }
 345  346  
 346  347  /*
 347  348   * The nbmand mount option can be changed at mount time.
 348  349   * We can't allow it to be toggled on live file systems or incorrect
 349  350   * behavior may be seen from cifs clients
 350  351   *
 351  352   * This property isn't registered via dsl_prop_register(), but this callback
 352  353   * will be called when a file system is first mounted
 353  354   */
 354  355  static void
 355  356  nbmand_changed_cb(void *arg, uint64_t newval)
 356  357  {
 357  358          zfsvfs_t *zfsvfs = arg;
 358  359          if (newval == FALSE) {
 359  360                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NBMAND);
 360  361                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NONBMAND, NULL, 0);
 361  362          } else {
 362  363                  vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NONBMAND);
 363  364                  vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NBMAND, NULL, 0);
 364  365          }
 365  366  }
 366  367  
 367  368  static void
 368  369  snapdir_changed_cb(void *arg, uint64_t newval)
 369  370  {
 370  371          zfsvfs_t *zfsvfs = arg;
 371  372  
 372  373          zfsvfs->z_show_ctldir = newval;
 373  374  }
 374  375  
 375  376  static void
 376  377  vscan_changed_cb(void *arg, uint64_t newval)
 377  378  {
 378  379          zfsvfs_t *zfsvfs = arg;
 379  380  
 380  381          zfsvfs->z_vscan = newval;
 381  382  }
 382  383  
 383  384  static void
 384  385  acl_mode_changed_cb(void *arg, uint64_t newval)
 385  386  {
 386  387          zfsvfs_t *zfsvfs = arg;
 387  388  
 388  389          zfsvfs->z_acl_mode = newval;

↓ open down ↓

331 lines elided

↑ open up ↑

 389  390  }
 390  391  
 391  392  static void
 392  393  acl_inherit_changed_cb(void *arg, uint64_t newval)
 393  394  {
 394  395          zfsvfs_t *zfsvfs = arg;
 395  396  
 396  397          zfsvfs->z_acl_inherit = newval;
 397  398  }
 398  399  
      400 +static void
      401 +rate_changed_cb(void *arg, uint64_t newval)
      402 +{
      403 +        zfsvfs_t *zfsvfs = arg;
      404 +
      405 +        if (newval == UINT64_MAX)
      406 +                newval = 0;
      407 +        zfsvfs->z_rate.rate_cap = newval;
      408 +}
      409 +
 399  410  static int
 400  411  zfs_register_callbacks(vfs_t *vfsp)
 401  412  {
 402  413          struct dsl_dataset *ds = NULL;
 403  414          objset_t *os = NULL;
 404  415          zfsvfs_t *zfsvfs = NULL;
 405  416          uint64_t nbmand;
 406  417          boolean_t readonly = B_FALSE;
 407  418          boolean_t do_readonly = B_FALSE;
 408  419          boolean_t setuid = B_FALSE;

 409  420          boolean_t do_setuid = B_FALSE;
 410  421          boolean_t exec = B_FALSE;
 411  422          boolean_t do_exec = B_FALSE;
 412  423          boolean_t devices = B_FALSE;
 413  424          boolean_t do_devices = B_FALSE;
 414  425          boolean_t xattr = B_FALSE;
 415  426          boolean_t do_xattr = B_FALSE;
 416  427          boolean_t atime = B_FALSE;
 417  428          boolean_t do_atime = B_FALSE;
 418  429          int error = 0;
 419  430  
 420  431          ASSERT(vfsp);
 421  432          zfsvfs = vfsp->vfs_data;
 422  433          ASSERT(zfsvfs);
 423  434          os = zfsvfs->z_os;
 424  435  
 425  436          /*
 426  437           * The act of registering our callbacks will destroy any mount
 427  438           * options we may have.  In order to enable temporary overrides
 428  439           * of mount options, we stash away the current values and
 429  440           * restore them after we register the callbacks.
 430  441           */
 431  442          if (vfs_optionisset(vfsp, MNTOPT_RO, NULL) ||
 432  443              !spa_writeable(dmu_objset_spa(os))) {
 433  444                  readonly = B_TRUE;
 434  445                  do_readonly = B_TRUE;
 435  446          } else if (vfs_optionisset(vfsp, MNTOPT_RW, NULL)) {
 436  447                  readonly = B_FALSE;
 437  448                  do_readonly = B_TRUE;
 438  449          }
 439  450          if (vfs_optionisset(vfsp, MNTOPT_NOSUID, NULL)) {
 440  451                  devices = B_FALSE;
 441  452                  setuid = B_FALSE;
 442  453                  do_devices = B_TRUE;
 443  454                  do_setuid = B_TRUE;
 444  455          } else {
 445  456                  if (vfs_optionisset(vfsp, MNTOPT_NODEVICES, NULL)) {
 446  457                          devices = B_FALSE;
 447  458                          do_devices = B_TRUE;
 448  459                  } else if (vfs_optionisset(vfsp, MNTOPT_DEVICES, NULL)) {
 449  460                          devices = B_TRUE;
 450  461                          do_devices = B_TRUE;
 451  462                  }
 452  463  
 453  464                  if (vfs_optionisset(vfsp, MNTOPT_NOSETUID, NULL)) {
 454  465                          setuid = B_FALSE;
 455  466                          do_setuid = B_TRUE;
 456  467                  } else if (vfs_optionisset(vfsp, MNTOPT_SETUID, NULL)) {
 457  468                          setuid = B_TRUE;
 458  469                          do_setuid = B_TRUE;
 459  470                  }
 460  471          }
 461  472          if (vfs_optionisset(vfsp, MNTOPT_NOEXEC, NULL)) {
 462  473                  exec = B_FALSE;
 463  474                  do_exec = B_TRUE;
 464  475          } else if (vfs_optionisset(vfsp, MNTOPT_EXEC, NULL)) {
 465  476                  exec = B_TRUE;
 466  477                  do_exec = B_TRUE;
 467  478          }
 468  479          if (vfs_optionisset(vfsp, MNTOPT_NOXATTR, NULL)) {
 469  480                  xattr = B_FALSE;
 470  481                  do_xattr = B_TRUE;
 471  482          } else if (vfs_optionisset(vfsp, MNTOPT_XATTR, NULL)) {
 472  483                  xattr = B_TRUE;
 473  484                  do_xattr = B_TRUE;
 474  485          }
 475  486          if (vfs_optionisset(vfsp, MNTOPT_NOATIME, NULL)) {
 476  487                  atime = B_FALSE;
 477  488                  do_atime = B_TRUE;
 478  489          } else if (vfs_optionisset(vfsp, MNTOPT_ATIME, NULL)) {
 479  490                  atime = B_TRUE;
 480  491                  do_atime = B_TRUE;
 481  492          }
 482  493  
 483  494          /*
 484  495           * nbmand is a special property.  It can only be changed at
 485  496           * mount time.
 486  497           *
 487  498           * This is weird, but it is documented to only be changeable
 488  499           * at mount time.
 489  500           */
 490  501          if (vfs_optionisset(vfsp, MNTOPT_NONBMAND, NULL)) {
 491  502                  nbmand = B_FALSE;
 492  503          } else if (vfs_optionisset(vfsp, MNTOPT_NBMAND, NULL)) {
 493  504                  nbmand = B_TRUE;
 494  505          } else {
 495  506                  char osname[ZFS_MAX_DATASET_NAME_LEN];
 496  507  
 497  508                  dmu_objset_name(os, osname);
 498  509                  if (error = dsl_prop_get_integer(osname, "nbmand", &nbmand,
 499  510                      NULL)) {
 500  511                          return (error);
 501  512                  }
 502  513          }
 503  514  
 504  515          /*
 505  516           * Register property callbacks.
 506  517           *
 507  518           * It would probably be fine to just check for i/o error from
 508  519           * the first prop_register(), but I guess I like to go
 509  520           * overboard...
 510  521           */
 511  522          ds = dmu_objset_ds(os);
 512  523          dsl_pool_config_enter(dmu_objset_pool(os), FTAG);
 513  524          error = dsl_prop_register(ds,
 514  525              zfs_prop_to_name(ZFS_PROP_ATIME), atime_changed_cb, zfsvfs);
 515  526          error = error ? error : dsl_prop_register(ds,
 516  527              zfs_prop_to_name(ZFS_PROP_XATTR), xattr_changed_cb, zfsvfs);
 517  528          error = error ? error : dsl_prop_register(ds,
 518  529              zfs_prop_to_name(ZFS_PROP_RECORDSIZE), blksz_changed_cb, zfsvfs);
 519  530          error = error ? error : dsl_prop_register(ds,
 520  531              zfs_prop_to_name(ZFS_PROP_READONLY), readonly_changed_cb, zfsvfs);
 521  532          error = error ? error : dsl_prop_register(ds,
 522  533              zfs_prop_to_name(ZFS_PROP_DEVICES), devices_changed_cb, zfsvfs);
 523  534          error = error ? error : dsl_prop_register(ds,
 524  535              zfs_prop_to_name(ZFS_PROP_SETUID), setuid_changed_cb, zfsvfs);
 525  536          error = error ? error : dsl_prop_register(ds,

↓ open down ↓

117 lines elided

↑ open up ↑

 526  537              zfs_prop_to_name(ZFS_PROP_EXEC), exec_changed_cb, zfsvfs);
 527  538          error = error ? error : dsl_prop_register(ds,
 528  539              zfs_prop_to_name(ZFS_PROP_SNAPDIR), snapdir_changed_cb, zfsvfs);
 529  540          error = error ? error : dsl_prop_register(ds,
 530  541              zfs_prop_to_name(ZFS_PROP_ACLMODE), acl_mode_changed_cb, zfsvfs);
 531  542          error = error ? error : dsl_prop_register(ds,
 532  543              zfs_prop_to_name(ZFS_PROP_ACLINHERIT), acl_inherit_changed_cb,
 533  544              zfsvfs);
 534  545          error = error ? error : dsl_prop_register(ds,
 535  546              zfs_prop_to_name(ZFS_PROP_VSCAN), vscan_changed_cb, zfsvfs);
      547 +        error = error ? error : dsl_prop_register(ds,
      548 +            zfs_prop_to_name(ZFS_PROP_RATE_LIMIT), rate_changed_cb, zfsvfs);
      549 +
 536  550          dsl_pool_config_exit(dmu_objset_pool(os), FTAG);
 537  551          if (error)
 538  552                  goto unregister;
 539  553  
 540  554          /*
 541  555           * Invoke our callbacks to restore temporary mount options.
 542  556           */
 543  557          if (do_readonly)
 544  558                  readonly_changed_cb(zfsvfs, readonly);
 545  559          if (do_setuid)

 546  560                  setuid_changed_cb(zfsvfs, setuid);
 547  561          if (do_exec)
 548  562                  exec_changed_cb(zfsvfs, exec);
 549  563          if (do_devices)
 550  564                  devices_changed_cb(zfsvfs, devices);
 551  565          if (do_xattr)
 552  566                  xattr_changed_cb(zfsvfs, xattr);
 553  567          if (do_atime)
 554  568                  atime_changed_cb(zfsvfs, atime);
 555  569  
 556  570          nbmand_changed_cb(zfsvfs, nbmand);
 557  571  
 558  572          return (0);
 559  573  
 560  574  unregister:
 561  575          dsl_prop_unregister_all(ds, zfsvfs);
 562  576          return (error);
 563  577  }
 564  578  
 565  579  static int
 566  580  zfs_space_delta_cb(dmu_object_type_t bonustype, void *data,
 567  581      uint64_t *userp, uint64_t *groupp)
 568  582  {
 569  583          /*
 570  584           * Is it a valid type of object to track?
 571  585           */
 572  586          if (bonustype != DMU_OT_ZNODE && bonustype != DMU_OT_SA)
 573  587                  return (SET_ERROR(ENOENT));
 574  588  
 575  589          /*
 576  590           * If we have a NULL data pointer
 577  591           * then assume the id's aren't changing and
 578  592           * return EEXIST to the dmu to let it know to
 579  593           * use the same ids
 580  594           */
 581  595          if (data == NULL)
 582  596                  return (SET_ERROR(EEXIST));
 583  597  
 584  598          if (bonustype == DMU_OT_ZNODE) {
 585  599                  znode_phys_t *znp = data;
 586  600                  *userp = znp->zp_uid;
 587  601                  *groupp = znp->zp_gid;
 588  602          } else {
 589  603                  int hdrsize;
 590  604                  sa_hdr_phys_t *sap = data;
 591  605                  sa_hdr_phys_t sa = *sap;
 592  606                  boolean_t swap = B_FALSE;
 593  607  
 594  608                  ASSERT(bonustype == DMU_OT_SA);
 595  609  
 596  610                  if (sa.sa_magic == 0) {
 597  611                          /*
 598  612                           * This should only happen for newly created
 599  613                           * files that haven't had the znode data filled
 600  614                           * in yet.
 601  615                           */
 602  616                          *userp = 0;
 603  617                          *groupp = 0;
 604  618                          return (0);
 605  619                  }
 606  620                  if (sa.sa_magic == BSWAP_32(SA_MAGIC)) {
 607  621                          sa.sa_magic = SA_MAGIC;
 608  622                          sa.sa_layout_info = BSWAP_16(sa.sa_layout_info);
 609  623                          swap = B_TRUE;
 610  624                  } else {
 611  625                          VERIFY3U(sa.sa_magic, ==, SA_MAGIC);
 612  626                  }
 613  627  
 614  628                  hdrsize = sa_hdrsize(&sa);
 615  629                  VERIFY3U(hdrsize, >=, sizeof (sa_hdr_phys_t));
 616  630                  *userp = *((uint64_t *)((uintptr_t)data + hdrsize +
 617  631                      SA_UID_OFFSET));
 618  632                  *groupp = *((uint64_t *)((uintptr_t)data + hdrsize +
 619  633                      SA_GID_OFFSET));
 620  634                  if (swap) {
 621  635                          *userp = BSWAP_64(*userp);
 622  636                          *groupp = BSWAP_64(*groupp);
 623  637                  }
 624  638          }
 625  639          return (0);
 626  640  }
 627  641  
 628  642  static void
 629  643  fuidstr_to_sid(zfsvfs_t *zfsvfs, const char *fuidstr,
 630  644      char *domainbuf, int buflen, uid_t *ridp)
 631  645  {
 632  646          uint64_t fuid;
 633  647          const char *domain;
 634  648  
 635  649          fuid = zfs_strtonum(fuidstr, NULL);
 636  650  
 637  651          domain = zfs_fuid_find_by_idx(zfsvfs, FUID_INDEX(fuid));
 638  652          if (domain)
 639  653                  (void) strlcpy(domainbuf, domain, buflen);
 640  654          else
 641  655                  domainbuf[0] = '\0';
 642  656          *ridp = FUID_RID(fuid);
 643  657  }
 644  658  
 645  659  static uint64_t
 646  660  zfs_userquota_prop_to_obj(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type)
 647  661  {
 648  662          switch (type) {
 649  663          case ZFS_PROP_USERUSED:
 650  664                  return (DMU_USERUSED_OBJECT);
 651  665          case ZFS_PROP_GROUPUSED:
 652  666                  return (DMU_GROUPUSED_OBJECT);
 653  667          case ZFS_PROP_USERQUOTA:
 654  668                  return (zfsvfs->z_userquota_obj);
 655  669          case ZFS_PROP_GROUPQUOTA:
 656  670                  return (zfsvfs->z_groupquota_obj);
 657  671          }
 658  672          return (0);
 659  673  }
 660  674  
 661  675  int
 662  676  zfs_userspace_many(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
 663  677      uint64_t *cookiep, void *vbuf, uint64_t *bufsizep)
 664  678  {
 665  679          int error;
 666  680          zap_cursor_t zc;
 667  681          zap_attribute_t za;
 668  682          zfs_useracct_t *buf = vbuf;
 669  683          uint64_t obj;
 670  684  
 671  685          if (!dmu_objset_userspace_present(zfsvfs->z_os))
 672  686                  return (SET_ERROR(ENOTSUP));
 673  687  
 674  688          obj = zfs_userquota_prop_to_obj(zfsvfs, type);
 675  689          if (obj == 0) {
 676  690                  *bufsizep = 0;
 677  691                  return (0);
 678  692          }
 679  693  
 680  694          for (zap_cursor_init_serialized(&zc, zfsvfs->z_os, obj, *cookiep);
 681  695              (error = zap_cursor_retrieve(&zc, &za)) == 0;
 682  696              zap_cursor_advance(&zc)) {
 683  697                  if ((uintptr_t)buf - (uintptr_t)vbuf + sizeof (zfs_useracct_t) >
 684  698                      *bufsizep)
 685  699                          break;
 686  700  
 687  701                  fuidstr_to_sid(zfsvfs, za.za_name,
 688  702                      buf->zu_domain, sizeof (buf->zu_domain), &buf->zu_rid);
 689  703  
 690  704                  buf->zu_space = za.za_first_integer;
 691  705                  buf++;
 692  706          }
 693  707          if (error == ENOENT)
 694  708                  error = 0;
 695  709  
 696  710          ASSERT3U((uintptr_t)buf - (uintptr_t)vbuf, <=, *bufsizep);
 697  711          *bufsizep = (uintptr_t)buf - (uintptr_t)vbuf;
 698  712          *cookiep = zap_cursor_serialize(&zc);
 699  713          zap_cursor_fini(&zc);
 700  714          return (error);
 701  715  }
 702  716  
 703  717  /*
 704  718   * buf must be big enough (eg, 32 bytes)
 705  719   */
 706  720  static int
 707  721  id_to_fuidstr(zfsvfs_t *zfsvfs, const char *domain, uid_t rid,
 708  722      char *buf, boolean_t addok)
 709  723  {
 710  724          uint64_t fuid;
 711  725          int domainid = 0;
 712  726  
 713  727          if (domain && domain[0]) {
 714  728                  domainid = zfs_fuid_find_by_domain(zfsvfs, domain, NULL, addok);
 715  729                  if (domainid == -1)
 716  730                          return (SET_ERROR(ENOENT));
 717  731          }
 718  732          fuid = FUID_ENCODE(domainid, rid);
 719  733          (void) sprintf(buf, "%llx", (longlong_t)fuid);
 720  734          return (0);
 721  735  }
 722  736  
 723  737  int
 724  738  zfs_userspace_one(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
 725  739      const char *domain, uint64_t rid, uint64_t *valp)
 726  740  {
 727  741          char buf[32];
 728  742          int err;
 729  743          uint64_t obj;
 730  744  
 731  745          *valp = 0;
 732  746  
 733  747          if (!dmu_objset_userspace_present(zfsvfs->z_os))
 734  748                  return (SET_ERROR(ENOTSUP));
 735  749  
 736  750          obj = zfs_userquota_prop_to_obj(zfsvfs, type);
 737  751          if (obj == 0)
 738  752                  return (0);
 739  753  
 740  754          err = id_to_fuidstr(zfsvfs, domain, rid, buf, B_FALSE);
 741  755          if (err)
 742  756                  return (err);
 743  757  
 744  758          err = zap_lookup(zfsvfs->z_os, obj, buf, 8, 1, valp);
 745  759          if (err == ENOENT)
 746  760                  err = 0;
 747  761          return (err);
 748  762  }
 749  763  
 750  764  int
 751  765  zfs_set_userquota(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
 752  766      const char *domain, uint64_t rid, uint64_t quota)
 753  767  {
 754  768          char buf[32];
 755  769          int err;
 756  770          dmu_tx_t *tx;
 757  771          uint64_t *objp;
 758  772          boolean_t fuid_dirtied;
 759  773  
 760  774          if (type != ZFS_PROP_USERQUOTA && type != ZFS_PROP_GROUPQUOTA)
 761  775                  return (SET_ERROR(EINVAL));
 762  776  
 763  777          if (zfsvfs->z_version < ZPL_VERSION_USERSPACE)
 764  778                  return (SET_ERROR(ENOTSUP));
 765  779  
 766  780          objp = (type == ZFS_PROP_USERQUOTA) ? &zfsvfs->z_userquota_obj :
 767  781              &zfsvfs->z_groupquota_obj;
 768  782  
 769  783          err = id_to_fuidstr(zfsvfs, domain, rid, buf, B_TRUE);
 770  784          if (err)
 771  785                  return (err);
 772  786          fuid_dirtied = zfsvfs->z_fuid_dirty;
 773  787  
 774  788          tx = dmu_tx_create(zfsvfs->z_os);
 775  789          dmu_tx_hold_zap(tx, *objp ? *objp : DMU_NEW_OBJECT, B_TRUE, NULL);
 776  790          if (*objp == 0) {
 777  791                  dmu_tx_hold_zap(tx, MASTER_NODE_OBJ, B_TRUE,
 778  792                      zfs_userquota_prop_prefixes[type]);
 779  793          }
 780  794          if (fuid_dirtied)
 781  795                  zfs_fuid_txhold(zfsvfs, tx);
 782  796          err = dmu_tx_assign(tx, TXG_WAIT);
 783  797          if (err) {
 784  798                  dmu_tx_abort(tx);
 785  799                  return (err);
 786  800          }
 787  801  
 788  802          mutex_enter(&zfsvfs->z_lock);
 789  803          if (*objp == 0) {
 790  804                  *objp = zap_create(zfsvfs->z_os, DMU_OT_USERGROUP_QUOTA,
 791  805                      DMU_OT_NONE, 0, tx);
 792  806                  VERIFY(0 == zap_add(zfsvfs->z_os, MASTER_NODE_OBJ,
 793  807                      zfs_userquota_prop_prefixes[type], 8, 1, objp, tx));
 794  808          }
 795  809          mutex_exit(&zfsvfs->z_lock);
 796  810  
 797  811          if (quota == 0) {
 798  812                  err = zap_remove(zfsvfs->z_os, *objp, buf, tx);
 799  813                  if (err == ENOENT)
 800  814                          err = 0;
 801  815          } else {
 802  816                  err = zap_update(zfsvfs->z_os, *objp, buf, 8, 1, &quota, tx);
 803  817          }
 804  818          ASSERT(err == 0);
 805  819          if (fuid_dirtied)
 806  820                  zfs_fuid_sync(zfsvfs, tx);
 807  821          dmu_tx_commit(tx);
 808  822          return (err);
 809  823  }
 810  824  
 811  825  boolean_t
 812  826  zfs_fuid_overquota(zfsvfs_t *zfsvfs, boolean_t isgroup, uint64_t fuid)
 813  827  {
 814  828          char buf[32];
 815  829          uint64_t used, quota, usedobj, quotaobj;
 816  830          int err;
 817  831  
 818  832          usedobj = isgroup ? DMU_GROUPUSED_OBJECT : DMU_USERUSED_OBJECT;
 819  833          quotaobj = isgroup ? zfsvfs->z_groupquota_obj : zfsvfs->z_userquota_obj;
 820  834  
 821  835          if (quotaobj == 0 || zfsvfs->z_replay)
 822  836                  return (B_FALSE);
 823  837  
 824  838          (void) sprintf(buf, "%llx", (longlong_t)fuid);
 825  839          err = zap_lookup(zfsvfs->z_os, quotaobj, buf, 8, 1, &quota);
 826  840          if (err != 0)
 827  841                  return (B_FALSE);
 828  842  
 829  843          err = zap_lookup(zfsvfs->z_os, usedobj, buf, 8, 1, &used);
 830  844          if (err != 0)
 831  845                  return (B_FALSE);
 832  846          return (used >= quota);
 833  847  }
 834  848  
 835  849  boolean_t
 836  850  zfs_owner_overquota(zfsvfs_t *zfsvfs, znode_t *zp, boolean_t isgroup)
 837  851  {
 838  852          uint64_t fuid;
 839  853          uint64_t quotaobj;
 840  854  
 841  855          quotaobj = isgroup ? zfsvfs->z_groupquota_obj : zfsvfs->z_userquota_obj;
 842  856  
 843  857          fuid = isgroup ? zp->z_gid : zp->z_uid;
 844  858  
 845  859          if (quotaobj == 0 || zfsvfs->z_replay)
 846  860                  return (B_FALSE);
 847  861  
 848  862          return (zfs_fuid_overquota(zfsvfs, isgroup, fuid));
 849  863  }
 850  864  
 851  865  /*
 852  866   * Associate this zfsvfs with the given objset, which must be owned.
 853  867   * This will cache a bunch of on-disk state from the objset in the
 854  868   * zfsvfs.
 855  869   */
 856  870  static int
 857  871  zfsvfs_init(zfsvfs_t *zfsvfs, objset_t *os)
 858  872  {
 859  873          int error;
 860  874          uint64_t val;
 861  875  
 862  876          zfsvfs->z_max_blksz = SPA_OLD_MAXBLOCKSIZE;
 863  877          zfsvfs->z_show_ctldir = ZFS_SNAPDIR_VISIBLE;
 864  878          zfsvfs->z_os = os;
 865  879  
 866  880          error = zfs_get_zplprop(os, ZFS_PROP_VERSION, &zfsvfs->z_version);
 867  881          if (error != 0)
 868  882                  return (error);
 869  883          if (zfsvfs->z_version >
 870  884              zfs_zpl_version_map(spa_version(dmu_objset_spa(os)))) {
 871  885                  (void) printf("Can't mount a version %lld file system "
 872  886                      "on a version %lld pool\n. Pool must be upgraded to mount "
 873  887                      "this file system.", (u_longlong_t)zfsvfs->z_version,
 874  888                      (u_longlong_t)spa_version(dmu_objset_spa(os)));
 875  889                  return (SET_ERROR(ENOTSUP));
 876  890          }
 877  891          error = zfs_get_zplprop(os, ZFS_PROP_NORMALIZE, &val);
 878  892          if (error != 0)
 879  893                  return (error);
 880  894          zfsvfs->z_norm = (int)val;
 881  895  
 882  896          error = zfs_get_zplprop(os, ZFS_PROP_UTF8ONLY, &val);
 883  897          if (error != 0)
 884  898                  return (error);
 885  899          zfsvfs->z_utf8 = (val != 0);
 886  900  
 887  901          error = zfs_get_zplprop(os, ZFS_PROP_CASE, &val);
 888  902          if (error != 0)
 889  903                  return (error);
 890  904          zfsvfs->z_case = (uint_t)val;
 891  905  
 892  906          /*
 893  907           * Fold case on file systems that are always or sometimes case
 894  908           * insensitive.
 895  909           */
 896  910          if (zfsvfs->z_case == ZFS_CASE_INSENSITIVE ||
 897  911              zfsvfs->z_case == ZFS_CASE_MIXED)
 898  912                  zfsvfs->z_norm |= U8_TEXTPREP_TOUPPER;
 899  913  
 900  914          zfsvfs->z_use_fuids = USE_FUIDS(zfsvfs->z_version, zfsvfs->z_os);
 901  915          zfsvfs->z_use_sa = USE_SA(zfsvfs->z_version, zfsvfs->z_os);
 902  916  
 903  917          uint64_t sa_obj = 0;
 904  918          if (zfsvfs->z_use_sa) {
 905  919                  /* should either have both of these objects or none */
 906  920                  error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_SA_ATTRS, 8, 1,
 907  921                      &sa_obj);
 908  922                  if (error != 0)
 909  923                          return (error);
 910  924          }
 911  925  
 912  926          error = sa_setup(os, sa_obj, zfs_attr_table, ZPL_END,
 913  927              &zfsvfs->z_attr_table);
 914  928          if (error != 0)
 915  929                  return (error);
 916  930  
 917  931          if (zfsvfs->z_version >= ZPL_VERSION_SA)
 918  932                  sa_register_update_callback(os, zfs_sa_upgrade);
 919  933  
 920  934          error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_ROOT_OBJ, 8, 1,
 921  935              &zfsvfs->z_root);
 922  936          if (error != 0)
 923  937                  return (error);
 924  938          ASSERT(zfsvfs->z_root != 0);
 925  939  
 926  940          error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_UNLINKED_SET, 8, 1,
 927  941              &zfsvfs->z_unlinkedobj);
 928  942          if (error != 0)
 929  943                  return (error);
 930  944  
 931  945          error = zap_lookup(os, MASTER_NODE_OBJ,
 932  946              zfs_userquota_prop_prefixes[ZFS_PROP_USERQUOTA],
 933  947              8, 1, &zfsvfs->z_userquota_obj);
 934  948          if (error == ENOENT)
 935  949                  zfsvfs->z_userquota_obj = 0;
 936  950          else if (error != 0)
 937  951                  return (error);
 938  952  
 939  953          error = zap_lookup(os, MASTER_NODE_OBJ,
 940  954              zfs_userquota_prop_prefixes[ZFS_PROP_GROUPQUOTA],
 941  955              8, 1, &zfsvfs->z_groupquota_obj);
 942  956          if (error == ENOENT)
 943  957                  zfsvfs->z_groupquota_obj = 0;
 944  958          else if (error != 0)
 945  959                  return (error);
 946  960  
 947  961          error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_FUID_TABLES, 8, 1,
 948  962              &zfsvfs->z_fuid_obj);
 949  963          if (error == ENOENT)
 950  964                  zfsvfs->z_fuid_obj = 0;
 951  965          else if (error != 0)
 952  966                  return (error);
 953  967  
 954  968          error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_SHARES_DIR, 8, 1,
 955  969              &zfsvfs->z_shares_dir);
 956  970          if (error == ENOENT)
 957  971                  zfsvfs->z_shares_dir = 0;
 958  972          else if (error != 0)
 959  973                  return (error);
 960  974  
 961  975          return (0);
 962  976  }
 963  977  
 964  978  int
 965  979  zfsvfs_create(const char *osname, zfsvfs_t **zfvp)
 966  980  {
 967  981          objset_t *os;
 968  982          zfsvfs_t *zfsvfs;
 969  983          int error;
 970  984  
 971  985          zfsvfs = kmem_zalloc(sizeof (zfsvfs_t), KM_SLEEP);
 972  986  
 973  987          /*
 974  988           * We claim to always be readonly so we can open snapshots;
 975  989           * other ZPL code will prevent us from writing to snapshots.
 976  990           */
 977  991  
 978  992          error = dmu_objset_own(osname, DMU_OST_ZFS, B_TRUE, zfsvfs, &os);
 979  993          if (error != 0) {
 980  994                  kmem_free(zfsvfs, sizeof (zfsvfs_t));
 981  995                  return (error);
 982  996          }
 983  997  
 984  998          error = zfsvfs_create_impl(zfvp, zfsvfs, os);
 985  999          if (error != 0) {

↓ open down ↓

440 lines elided

↑ open up ↑

 986 1000                  dmu_objset_disown(os, zfsvfs);
 987 1001          }
 988 1002          return (error);
 989 1003  }
 990 1004  
 991 1005  
 992 1006  int
 993 1007  zfsvfs_create_impl(zfsvfs_t **zfvp, zfsvfs_t *zfsvfs, objset_t *os)
 994 1008  {
 995 1009          int error;
     1010 +        int size = spa_get_obj_mtx_sz(dmu_objset_spa(os));
 996 1011  
 997 1012          zfsvfs->z_vfs = NULL;
 998 1013          zfsvfs->z_parent = zfsvfs;
 999 1014  
1000 1015          mutex_init(&zfsvfs->z_znodes_lock, NULL, MUTEX_DEFAULT, NULL);
1001 1016          mutex_init(&zfsvfs->z_lock, NULL, MUTEX_DEFAULT, NULL);
1002 1017          list_create(&zfsvfs->z_all_znodes, sizeof (znode_t),
1003 1018              offsetof(znode_t, z_link_node));
1004 1019          rrm_init(&zfsvfs->z_teardown_lock, B_FALSE);
1005 1020          rw_init(&zfsvfs->z_teardown_inactive_lock, NULL, RW_DEFAULT, NULL);
1006 1021          rw_init(&zfsvfs->z_fuid_lock, NULL, RW_DEFAULT, NULL);
1007      -        for (int i = 0; i != ZFS_OBJ_MTX_SZ; i++)
     1022 +        zfsvfs->z_hold_mtx_sz = size;
     1023 +        zfsvfs->z_hold_mtx = kmem_zalloc(sizeof (kmutex_t) * size, KM_SLEEP);
     1024 +        for (int i = 0; i != size; i++)
1008 1025                  mutex_init(&zfsvfs->z_hold_mtx[i], NULL, MUTEX_DEFAULT, NULL);
     1026 +        mutex_init(&zfsvfs->z_drain_lock, NULL, MUTEX_DEFAULT, NULL);
     1027 +        cv_init(&zfsvfs->z_drain_cv, NULL, CV_DEFAULT, NULL);
1009 1028  
1010 1029          error = zfsvfs_init(zfsvfs, os);
1011 1030          if (error != 0) {
1012 1031                  *zfvp = NULL;
     1032 +                kmem_free(zfsvfs->z_hold_mtx, sizeof (kmutex_t) * size);
1013 1033                  kmem_free(zfsvfs, sizeof (zfsvfs_t));
1014 1034                  return (error);
1015 1035          }
1016 1036  
1017 1037          *zfvp = zfsvfs;
1018 1038          return (0);
1019 1039  }
1020 1040  
1021 1041  static int
1022 1042  zfsvfs_setup(zfsvfs_t *zfsvfs, boolean_t mounting)

1023 1043  {
1024 1044          int error;
1025 1045  
1026 1046          error = zfs_register_callbacks(zfsvfs->z_vfs);
1027 1047          if (error)
1028 1048                  return (error);
1029 1049  
1030 1050          zfsvfs->z_log = zil_open(zfsvfs->z_os, zfs_get_data);
1031 1051  
1032 1052          /*
1033 1053           * If we are not mounting (ie: online recv), then we don't
1034 1054           * have to worry about replaying the log as we blocked all

↓ open down ↓

12 lines elided

↑ open up ↑

1035 1055           * operations out since we closed the ZIL.
1036 1056           */
1037 1057          if (mounting) {
1038 1058                  boolean_t readonly;
1039 1059  
1040 1060                  /*
1041 1061                   * During replay we remove the read only flag to
1042 1062                   * allow replays to succeed.
1043 1063                   */
1044 1064                  readonly = zfsvfs->z_vfs->vfs_flag & VFS_RDONLY;
1045      -                if (readonly != 0)
     1065 +                if (readonly)
1046 1066                          zfsvfs->z_vfs->vfs_flag &= ~VFS_RDONLY;
1047      -                else
     1067 +                else {
1048 1068                          zfs_unlinked_drain(zfsvfs);
     1069 +                }
1049 1070  
1050 1071                  /*
1051 1072                   * Parse and replay the intent log.
1052 1073                   *
1053 1074                   * Because of ziltest, this must be done after
1054 1075                   * zfs_unlinked_drain().  (Further note: ziltest
1055 1076                   * doesn't use readonly mounts, where
1056 1077                   * zfs_unlinked_drain() isn't called.)  This is because
1057 1078                   * ziltest causes spa_sync() to think it's committed,
1058 1079                   * but actually it is not, so the intent log contains

1059 1080                   * many txg's worth of changes.
1060 1081                   *
1061 1082                   * In particular, if object N is in the unlinked set in
1062 1083                   * the last txg to actually sync, then it could be
1063 1084                   * actually freed in a later txg and then reallocated
1064 1085                   * in a yet later txg.  This would write a "create
1065 1086                   * object N" record to the intent log.  Normally, this
1066 1087                   * would be fine because the spa_sync() would have
1067 1088                   * written out the fact that object N is free, before
1068 1089                   * we could write the "create object N" intent log
1069 1090                   * record.
1070 1091                   *
1071 1092                   * But when we are in ziltest mode, we advance the "open
1072 1093                   * txg" without actually spa_sync()-ing the changes to
1073 1094                   * disk.  So we would see that object N is still
1074 1095                   * allocated and in the unlinked set, and there is an
1075 1096                   * intent log record saying to allocate it.
1076 1097                   */

↓ open down ↓

18 lines elided

↑ open up ↑

1077 1098                  if (spa_writeable(dmu_objset_spa(zfsvfs->z_os))) {
1078 1099                          if (zil_replay_disable) {
1079 1100                                  zil_destroy(zfsvfs->z_log, B_FALSE);
1080 1101                          } else {
1081 1102                                  zfsvfs->z_replay = B_TRUE;
1082 1103                                  zil_replay(zfsvfs->z_os, zfsvfs,
1083 1104                                      zfs_replay_vector);
1084 1105                                  zfsvfs->z_replay = B_FALSE;
1085 1106                          }
1086 1107                  }
1087      -                zfsvfs->z_vfs->vfs_flag |= readonly; /* restore readonly bit */
     1108 +
     1109 +                /* restore readonly bit */
     1110 +                if (readonly)
     1111 +                        zfsvfs->z_vfs->vfs_flag |= VFS_RDONLY;
1088 1112          }
1089 1113  
1090 1114          /*
1091 1115           * Set the objset user_ptr to track its zfsvfs.
1092 1116           */
1093 1117          mutex_enter(&zfsvfs->z_os->os_user_ptr_lock);
1094 1118          dmu_objset_set_user(zfsvfs->z_os, zfsvfs);
1095 1119          mutex_exit(&zfsvfs->z_os->os_user_ptr_lock);
1096 1120  
1097 1121          return (0);

1098 1122  }
1099 1123  
1100 1124  void
1101 1125  zfsvfs_free(zfsvfs_t *zfsvfs)
1102 1126  {
1103 1127          int i;
1104 1128          extern krwlock_t zfsvfs_lock; /* in zfs_znode.c */

↓ open down ↓

7 lines elided

↑ open up ↑

1105 1129  
1106 1130          /*
1107 1131           * This is a barrier to prevent the filesystem from going away in
1108 1132           * zfs_znode_move() until we can safely ensure that the filesystem is
1109 1133           * not unmounted. We consider the filesystem valid before the barrier
1110 1134           * and invalid after the barrier.
1111 1135           */
1112 1136          rw_enter(&zfsvfs_lock, RW_READER);
1113 1137          rw_exit(&zfsvfs_lock);
1114 1138  
     1139 +        VERIFY0(zfsvfs->z_znodes_freeing_cnt);
     1140 +
1115 1141          zfs_fuid_destroy(zfsvfs);
1116 1142  
     1143 +        cv_destroy(&zfsvfs->z_drain_cv);
     1144 +        mutex_destroy(&zfsvfs->z_drain_lock);
1117 1145          mutex_destroy(&zfsvfs->z_znodes_lock);
1118 1146          mutex_destroy(&zfsvfs->z_lock);
1119 1147          list_destroy(&zfsvfs->z_all_znodes);
1120 1148          rrm_destroy(&zfsvfs->z_teardown_lock);
1121 1149          rw_destroy(&zfsvfs->z_teardown_inactive_lock);
1122 1150          rw_destroy(&zfsvfs->z_fuid_lock);
1123      -        for (i = 0; i != ZFS_OBJ_MTX_SZ; i++)
     1151 +        for (i = 0; i != zfsvfs->z_hold_mtx_sz; i++)
1124 1152                  mutex_destroy(&zfsvfs->z_hold_mtx[i]);
     1153 +
     1154 +        kmem_free(zfsvfs->z_hold_mtx,
     1155 +            sizeof (kmutex_t) * zfsvfs->z_hold_mtx_sz);
1125 1156          kmem_free(zfsvfs, sizeof (zfsvfs_t));
1126 1157  }
1127 1158  
1128 1159  static void
1129 1160  zfs_set_fuid_feature(zfsvfs_t *zfsvfs)
1130 1161  {
1131 1162          zfsvfs->z_use_fuids = USE_FUIDS(zfsvfs->z_version, zfsvfs->z_os);
1132 1163          if (zfsvfs->z_vfs) {
1133 1164                  if (zfsvfs->z_use_fuids) {
1134 1165                          vfs_set_feature(zfsvfs->z_vfs, VFSFT_XVATTR);

1135 1166                          vfs_set_feature(zfsvfs->z_vfs, VFSFT_SYSATTR_VIEWS);
1136 1167                          vfs_set_feature(zfsvfs->z_vfs, VFSFT_ACEMASKONACCESS);
1137 1168                          vfs_set_feature(zfsvfs->z_vfs, VFSFT_ACLONCREATE);
1138 1169                          vfs_set_feature(zfsvfs->z_vfs, VFSFT_ACCESS_FILTER);
1139 1170                          vfs_set_feature(zfsvfs->z_vfs, VFSFT_REPARSE);
1140 1171                  } else {
1141 1172                          vfs_clear_feature(zfsvfs->z_vfs, VFSFT_XVATTR);
1142 1173                          vfs_clear_feature(zfsvfs->z_vfs, VFSFT_SYSATTR_VIEWS);
1143 1174                          vfs_clear_feature(zfsvfs->z_vfs, VFSFT_ACEMASKONACCESS);
1144 1175                          vfs_clear_feature(zfsvfs->z_vfs, VFSFT_ACLONCREATE);
1145 1176                          vfs_clear_feature(zfsvfs->z_vfs, VFSFT_ACCESS_FILTER);
1146 1177                          vfs_clear_feature(zfsvfs->z_vfs, VFSFT_REPARSE);
1147 1178                  }
1148 1179          }

↓ open down ↓

14 lines elided

↑ open up ↑

1149 1180          zfsvfs->z_use_sa = USE_SA(zfsvfs->z_version, zfsvfs->z_os);
1150 1181  }
1151 1182  
1152 1183  static int
1153 1184  zfs_domount(vfs_t *vfsp, char *osname)
1154 1185  {
1155 1186          dev_t mount_dev;
1156 1187          uint64_t recordsize, fsid_guid;
1157 1188          int error = 0;
1158 1189          zfsvfs_t *zfsvfs;
     1190 +        char    worminfo[13] = {0};
1159 1191  
1160 1192          ASSERT(vfsp);
1161 1193          ASSERT(osname);
1162 1194  
1163 1195          error = zfsvfs_create(osname, &zfsvfs);
1164 1196          if (error)
1165 1197                  return (error);
1166 1198          zfsvfs->z_vfs = vfsp;
1167 1199  
1168 1200          /* Initialize the generic filesystem structure. */

1169 1201          vfsp->vfs_bcount = 0;
1170 1202          vfsp->vfs_data = NULL;
1171 1203

↓ open down ↓

3 lines elided

↑ open up ↑

1172 1204          if (zfs_create_unique_device(&mount_dev) == -1) {
1173 1205                  error = SET_ERROR(ENODEV);
1174 1206                  goto out;
1175 1207          }
1176 1208          ASSERT(vfs_devismounted(mount_dev) == 0);
1177 1209  
1178 1210          if (error = dsl_prop_get_integer(osname, "recordsize", &recordsize,
1179 1211              NULL))
1180 1212                  goto out;
1181 1213  
     1214 +        if (dsl_prop_get(osname, "nms:worm", 1, 12, &worminfo, NULL) == 0 &&
     1215 +            worminfo[0] && strcmp(worminfo, "0") != 0 &&
     1216 +            strcmp(worminfo, "off") != 0 && strcmp(worminfo, "-") != 0) {
     1217 +                zfsvfs->z_isworm = B_TRUE;
     1218 +        } else {
     1219 +                zfsvfs->z_isworm = B_FALSE;
     1220 +        }
     1221 +
1182 1222          vfsp->vfs_dev = mount_dev;
1183 1223          vfsp->vfs_fstype = zfsfstype;
1184 1224          vfsp->vfs_bsize = recordsize;
1185 1225          vfsp->vfs_flag |= VFS_NOTRUNC;
1186 1226          vfsp->vfs_data = zfsvfs;
1187 1227  
1188 1228          /*
1189 1229           * The fsid is 64 bits, composed of an 8-bit fs type, which
1190 1230           * separates our fsid from any other filesystem types, and a
1191 1231           * 56-bit objset unique ID.  The objset unique ID is unique to

1192 1232           * all objsets open on this system, provided by unique_create().
1193 1233           * The 8-bit fs type must be put in the low bits of fsid[1]
1194 1234           * because that's where other Solaris filesystems put it.
1195 1235           */
1196 1236          fsid_guid = dmu_objset_fsid_guid(zfsvfs->z_os);
1197 1237          ASSERT((fsid_guid & ~((1ULL<<56)-1)) == 0);
1198 1238          vfsp->vfs_fsid.val[0] = fsid_guid;
1199 1239          vfsp->vfs_fsid.val[1] = ((fsid_guid>>32) << 8) |
1200 1240              zfsfstype & 0xFF;
1201 1241  
1202 1242          /*
1203 1243           * Set features for file system.
1204 1244           */
1205 1245          zfs_set_fuid_feature(zfsvfs);
1206 1246          if (zfsvfs->z_case == ZFS_CASE_INSENSITIVE) {
1207 1247                  vfs_set_feature(vfsp, VFSFT_DIRENTFLAGS);
1208 1248                  vfs_set_feature(vfsp, VFSFT_CASEINSENSITIVE);
1209 1249                  vfs_set_feature(vfsp, VFSFT_NOCASESENSITIVE);
1210 1250          } else if (zfsvfs->z_case == ZFS_CASE_MIXED) {
1211 1251                  vfs_set_feature(vfsp, VFSFT_DIRENTFLAGS);
1212 1252                  vfs_set_feature(vfsp, VFSFT_CASEINSENSITIVE);
1213 1253          }
1214 1254          vfs_set_feature(vfsp, VFSFT_ZEROCOPY_SUPPORTED);
1215 1255  
1216 1256          if (dmu_objset_is_snapshot(zfsvfs->z_os)) {
1217 1257                  uint64_t pval;
1218 1258  
1219 1259                  atime_changed_cb(zfsvfs, B_FALSE);
1220 1260                  readonly_changed_cb(zfsvfs, B_TRUE);
1221 1261                  if (error = dsl_prop_get_integer(osname, "xattr", &pval, NULL))
1222 1262                          goto out;
1223 1263                  xattr_changed_cb(zfsvfs, pval);
1224 1264                  zfsvfs->z_issnap = B_TRUE;
1225 1265                  zfsvfs->z_os->os_sync = ZFS_SYNC_DISABLED;
1226 1266  
1227 1267                  mutex_enter(&zfsvfs->z_os->os_user_ptr_lock);
1228 1268                  dmu_objset_set_user(zfsvfs->z_os, zfsvfs);
1229 1269                  mutex_exit(&zfsvfs->z_os->os_user_ptr_lock);
1230 1270          } else {
1231 1271                  error = zfsvfs_setup(zfsvfs, B_TRUE);
1232 1272          }
1233 1273  
1234 1274          if (!zfsvfs->z_issnap)
1235 1275                  zfsctl_create(zfsvfs);
1236 1276  out:
1237 1277          if (error) {
1238 1278                  dmu_objset_disown(zfsvfs->z_os, zfsvfs);
1239 1279                  zfsvfs_free(zfsvfs);
1240 1280          } else {
1241 1281                  atomic_inc_32(&zfs_active_fs_count);
1242 1282          }
1243 1283  
1244 1284          return (error);
1245 1285  }
1246 1286  
1247 1287  void
1248 1288  zfs_unregister_callbacks(zfsvfs_t *zfsvfs)
1249 1289  {
1250 1290          objset_t *os = zfsvfs->z_os;
1251 1291  
1252 1292          if (!dmu_objset_is_snapshot(os))
1253 1293                  dsl_prop_unregister_all(dmu_objset_ds(os), zfsvfs);
1254 1294  }
1255 1295  
1256 1296  /*
1257 1297   * Convert a decimal digit string to a uint64_t integer.
1258 1298   */
1259 1299  static int
1260 1300  str_to_uint64(char *str, uint64_t *objnum)
1261 1301  {
1262 1302          uint64_t num = 0;
1263 1303  
1264 1304          while (*str) {
1265 1305                  if (*str < '0' || *str > '9')
1266 1306                          return (SET_ERROR(EINVAL));
1267 1307  
1268 1308                  num = num*10 + *str++ - '0';
1269 1309          }
1270 1310  
1271 1311          *objnum = num;
1272 1312          return (0);
1273 1313  }
1274 1314  
1275 1315  /*
1276 1316   * The boot path passed from the boot loader is in the form of
1277 1317   * "rootpool-name/root-filesystem-object-number'. Convert this
1278 1318   * string to a dataset name: "rootpool-name/root-filesystem-name".
1279 1319   */
1280 1320  static int
1281 1321  zfs_parse_bootfs(char *bpath, char *outpath)
1282 1322  {
1283 1323          char *slashp;
1284 1324          uint64_t objnum;
1285 1325          int error;
1286 1326  
1287 1327          if (*bpath == 0 || *bpath == '/')
1288 1328                  return (SET_ERROR(EINVAL));
1289 1329  
1290 1330          (void) strcpy(outpath, bpath);
1291 1331  
1292 1332          slashp = strchr(bpath, '/');
1293 1333  
1294 1334          /* if no '/', just return the pool name */
1295 1335          if (slashp == NULL) {
1296 1336                  return (0);
1297 1337          }
1298 1338  
1299 1339          /* if not a number, just return the root dataset name */
1300 1340          if (str_to_uint64(slashp+1, &objnum)) {
1301 1341                  return (0);
1302 1342          }
1303 1343  
1304 1344          *slashp = '\0';
1305 1345          error = dsl_dsobj_to_dsname(bpath, objnum, outpath);
1306 1346          *slashp = '/';
1307 1347  
1308 1348          return (error);
1309 1349  }
1310 1350  
1311 1351  /*
1312 1352   * Check that the hex label string is appropriate for the dataset being
1313 1353   * mounted into the global_zone proper.
1314 1354   *
1315 1355   * Return an error if the hex label string is not default or
1316 1356   * admin_low/admin_high.  For admin_low labels, the corresponding
1317 1357   * dataset must be readonly.
1318 1358   */
1319 1359  int
1320 1360  zfs_check_global_label(const char *dsname, const char *hexsl)
1321 1361  {
1322 1362          if (strcasecmp(hexsl, ZFS_MLSLABEL_DEFAULT) == 0)
1323 1363                  return (0);
1324 1364          if (strcasecmp(hexsl, ADMIN_HIGH) == 0)
1325 1365                  return (0);
1326 1366          if (strcasecmp(hexsl, ADMIN_LOW) == 0) {
1327 1367                  /* must be readonly */
1328 1368                  uint64_t rdonly;
1329 1369  
1330 1370                  if (dsl_prop_get_integer(dsname,
1331 1371                      zfs_prop_to_name(ZFS_PROP_READONLY), &rdonly, NULL))
1332 1372                          return (SET_ERROR(EACCES));
1333 1373                  return (rdonly ? 0 : EACCES);
1334 1374          }
1335 1375          return (SET_ERROR(EACCES));
1336 1376  }
1337 1377  
1338 1378  /*
1339 1379   * Determine whether the mount is allowed according to MAC check.
1340 1380   * by comparing (where appropriate) label of the dataset against
1341 1381   * the label of the zone being mounted into.  If the dataset has
1342 1382   * no label, create one.
1343 1383   *
1344 1384   * Returns 0 if access allowed, error otherwise (e.g. EACCES)
1345 1385   */
1346 1386  static int
1347 1387  zfs_mount_label_policy(vfs_t *vfsp, char *osname)
1348 1388  {
1349 1389          int             error, retv;
1350 1390          zone_t          *mntzone = NULL;
1351 1391          ts_label_t      *mnt_tsl;
1352 1392          bslabel_t       *mnt_sl;
1353 1393          bslabel_t       ds_sl;
1354 1394          char            ds_hexsl[MAXNAMELEN];
1355 1395  
1356 1396          retv = EACCES;                          /* assume the worst */
1357 1397  
1358 1398          /*
1359 1399           * Start by getting the dataset label if it exists.
1360 1400           */
1361 1401          error = dsl_prop_get(osname, zfs_prop_to_name(ZFS_PROP_MLSLABEL),
1362 1402              1, sizeof (ds_hexsl), &ds_hexsl, NULL);
1363 1403          if (error)
1364 1404                  return (SET_ERROR(EACCES));
1365 1405  
1366 1406          /*
1367 1407           * If labeling is NOT enabled, then disallow the mount of datasets
1368 1408           * which have a non-default label already.  No other label checks
1369 1409           * are needed.
1370 1410           */
1371 1411          if (!is_system_labeled()) {
1372 1412                  if (strcasecmp(ds_hexsl, ZFS_MLSLABEL_DEFAULT) == 0)
1373 1413                          return (0);
1374 1414                  return (SET_ERROR(EACCES));
1375 1415          }
1376 1416  
1377 1417          /*
1378 1418           * Get the label of the mountpoint.  If mounting into the global
1379 1419           * zone (i.e. mountpoint is not within an active zone and the
1380 1420           * zoned property is off), the label must be default or
1381 1421           * admin_low/admin_high only; no other checks are needed.
1382 1422           */
1383 1423          mntzone = zone_find_by_any_path(refstr_value(vfsp->vfs_mntpt), B_FALSE);
1384 1424          if (mntzone->zone_id == GLOBAL_ZONEID) {
1385 1425                  uint64_t zoned;
1386 1426  
1387 1427                  zone_rele(mntzone);
1388 1428  
1389 1429                  if (dsl_prop_get_integer(osname,
1390 1430                      zfs_prop_to_name(ZFS_PROP_ZONED), &zoned, NULL))
1391 1431                          return (SET_ERROR(EACCES));
1392 1432                  if (!zoned)
1393 1433                          return (zfs_check_global_label(osname, ds_hexsl));
1394 1434                  else
1395 1435                          /*
1396 1436                           * This is the case of a zone dataset being mounted
1397 1437                           * initially, before the zone has been fully created;
1398 1438                           * allow this mount into global zone.
1399 1439                           */
1400 1440                          return (0);
1401 1441          }
1402 1442  
1403 1443          mnt_tsl = mntzone->zone_slabel;
1404 1444          ASSERT(mnt_tsl != NULL);
1405 1445          label_hold(mnt_tsl);
1406 1446          mnt_sl = label2bslabel(mnt_tsl);
1407 1447  
1408 1448          if (strcasecmp(ds_hexsl, ZFS_MLSLABEL_DEFAULT) == 0) {
1409 1449                  /*
1410 1450                   * The dataset doesn't have a real label, so fabricate one.
1411 1451                   */
1412 1452                  char *str = NULL;
1413 1453  
1414 1454                  if (l_to_str_internal(mnt_sl, &str) == 0 &&
1415 1455                      dsl_prop_set_string(osname,
1416 1456                      zfs_prop_to_name(ZFS_PROP_MLSLABEL),
1417 1457                      ZPROP_SRC_LOCAL, str) == 0)
1418 1458                          retv = 0;
1419 1459                  if (str != NULL)
1420 1460                          kmem_free(str, strlen(str) + 1);
1421 1461          } else if (hexstr_to_label(ds_hexsl, &ds_sl) == 0) {
1422 1462                  /*
1423 1463                   * Now compare labels to complete the MAC check.  If the
1424 1464                   * labels are equal then allow access.  If the mountpoint
1425 1465                   * label dominates the dataset label, allow readonly access.
1426 1466                   * Otherwise, access is denied.
1427 1467                   */
1428 1468                  if (blequal(mnt_sl, &ds_sl))
1429 1469                          retv = 0;
1430 1470                  else if (bldominates(mnt_sl, &ds_sl)) {
1431 1471                          vfs_setmntopt(vfsp, MNTOPT_RO, NULL, 0);
1432 1472                          retv = 0;
1433 1473                  }
1434 1474          }
1435 1475  
1436 1476          label_rele(mnt_tsl);
1437 1477          zone_rele(mntzone);
1438 1478          return (retv);
1439 1479  }
1440 1480  
1441 1481  static int
1442 1482  zfs_mountroot(vfs_t *vfsp, enum whymountroot why)
1443 1483  {
1444 1484          int error = 0;
1445 1485          static int zfsrootdone = 0;
1446 1486          zfsvfs_t *zfsvfs = NULL;
1447 1487          znode_t *zp = NULL;
1448 1488          vnode_t *vp = NULL;
1449 1489          char *zfs_bootfs;
1450 1490          char *zfs_devid;
1451 1491  
1452 1492          ASSERT(vfsp);
1453 1493  
1454 1494          /*
1455 1495           * The filesystem that we mount as root is defined in the
1456 1496           * boot property "zfs-bootfs" with a format of
1457 1497           * "poolname/root-dataset-objnum".
1458 1498           */
1459 1499          if (why == ROOT_INIT) {
1460 1500                  if (zfsrootdone++)
1461 1501                          return (SET_ERROR(EBUSY));
1462 1502                  /*
1463 1503                   * the process of doing a spa_load will require the
1464 1504                   * clock to be set before we could (for example) do
1465 1505                   * something better by looking at the timestamp on
1466 1506                   * an uberblock, so just set it to -1.
1467 1507                   */
1468 1508                  clkset(-1);
1469 1509  
1470 1510                  if ((zfs_bootfs = spa_get_bootprop("zfs-bootfs")) == NULL) {
1471 1511                          cmn_err(CE_NOTE, "spa_get_bootfs: can not get "
1472 1512                              "bootfs name");
1473 1513                          return (SET_ERROR(EINVAL));
1474 1514                  }
1475 1515                  zfs_devid = spa_get_bootprop("diskdevid");
1476 1516                  error = spa_import_rootpool(rootfs.bo_name, zfs_devid);
1477 1517                  if (zfs_devid)
1478 1518                          spa_free_bootprop(zfs_devid);
1479 1519                  if (error) {
1480 1520                          spa_free_bootprop(zfs_bootfs);
1481 1521                          cmn_err(CE_NOTE, "spa_import_rootpool: error %d",
1482 1522                              error);
1483 1523                          return (error);
1484 1524                  }
1485 1525                  if (error = zfs_parse_bootfs(zfs_bootfs, rootfs.bo_name)) {
1486 1526                          spa_free_bootprop(zfs_bootfs);
1487 1527                          cmn_err(CE_NOTE, "zfs_parse_bootfs: error %d",
1488 1528                              error);
1489 1529                          return (error);
1490 1530                  }
1491 1531  
1492 1532                  spa_free_bootprop(zfs_bootfs);
1493 1533  
1494 1534                  if (error = vfs_lock(vfsp))
1495 1535                          return (error);
1496 1536  
1497 1537                  if (error = zfs_domount(vfsp, rootfs.bo_name)) {
1498 1538                          cmn_err(CE_NOTE, "zfs_domount: error %d", error);
1499 1539                          goto out;
1500 1540                  }
1501 1541  
1502 1542                  zfsvfs = (zfsvfs_t *)vfsp->vfs_data;
1503 1543                  ASSERT(zfsvfs);
1504 1544                  if (error = zfs_zget(zfsvfs, zfsvfs->z_root, &zp)) {
1505 1545                          cmn_err(CE_NOTE, "zfs_zget: error %d", error);
1506 1546                          goto out;
1507 1547                  }
1508 1548  
1509 1549                  vp = ZTOV(zp);
1510 1550                  mutex_enter(&vp->v_lock);
1511 1551                  vp->v_flag |= VROOT;
1512 1552                  mutex_exit(&vp->v_lock);
1513 1553                  rootvp = vp;
1514 1554  
1515 1555                  /*
1516 1556                   * Leave rootvp held.  The root file system is never unmounted.
1517 1557                   */
1518 1558  
1519 1559                  vfs_add((struct vnode *)0, vfsp,
1520 1560                      (vfsp->vfs_flag & VFS_RDONLY) ? MS_RDONLY : 0);
1521 1561  out:
1522 1562                  vfs_unlock(vfsp);
1523 1563                  return (error);
1524 1564          } else if (why == ROOT_REMOUNT) {
1525 1565                  readonly_changed_cb(vfsp->vfs_data, B_FALSE);
1526 1566                  vfsp->vfs_flag |= VFS_REMOUNT;
1527 1567  
1528 1568                  /* refresh mount options */
1529 1569                  zfs_unregister_callbacks(vfsp->vfs_data);
1530 1570                  return (zfs_register_callbacks(vfsp));
1531 1571  
1532 1572          } else if (why == ROOT_UNMOUNT) {
1533 1573                  zfs_unregister_callbacks((zfsvfs_t *)vfsp->vfs_data);
1534 1574                  (void) zfs_sync(vfsp, 0, 0);
1535 1575                  return (0);
1536 1576          }
1537 1577  
1538 1578          /*
1539 1579           * if "why" is equal to anything else other than ROOT_INIT,
1540 1580           * ROOT_REMOUNT, or ROOT_UNMOUNT, we do not support it.
1541 1581           */
1542 1582          return (SET_ERROR(ENOTSUP));
1543 1583  }
1544 1584  
1545 1585  /*ARGSUSED*/
1546 1586  static int
1547 1587  zfs_mount(vfs_t *vfsp, vnode_t *mvp, struct mounta *uap, cred_t *cr)
1548 1588  {
1549 1589          char            *osname;
1550 1590          pathname_t      spn;
1551 1591          int             error = 0;
1552 1592          uio_seg_t       fromspace = (uap->flags & MS_SYSSPACE) ?
1553 1593              UIO_SYSSPACE : UIO_USERSPACE;
1554 1594          int             canwrite;
1555 1595  
1556 1596          if (mvp->v_type != VDIR)
1557 1597                  return (SET_ERROR(ENOTDIR));
1558 1598  
1559 1599          mutex_enter(&mvp->v_lock);
1560 1600          if ((uap->flags & MS_REMOUNT) == 0 &&
1561 1601              (uap->flags & MS_OVERLAY) == 0 &&
1562 1602              (mvp->v_count != 1 || (mvp->v_flag & VROOT))) {
1563 1603                  mutex_exit(&mvp->v_lock);
1564 1604                  return (SET_ERROR(EBUSY));
1565 1605          }
1566 1606          mutex_exit(&mvp->v_lock);
1567 1607  
1568 1608          /*
1569 1609           * ZFS does not support passing unparsed data in via MS_DATA.
1570 1610           * Users should use the MS_OPTIONSTR interface; this means
1571 1611           * that all option parsing is already done and the options struct
1572 1612           * can be interrogated.
1573 1613           */
1574 1614          if ((uap->flags & MS_DATA) && uap->datalen > 0)
1575 1615                  return (SET_ERROR(EINVAL));
1576 1616  
1577 1617          /*
1578 1618           * Get the objset name (the "special" mount argument).
1579 1619           */
1580 1620          if (error = pn_get(uap->spec, fromspace, &spn))
1581 1621                  return (error);
1582 1622  
1583 1623          osname = spn.pn_path;
1584 1624  
1585 1625          /*
1586 1626           * Check for mount privilege?
1587 1627           *
1588 1628           * If we don't have privilege then see if
1589 1629           * we have local permission to allow it
1590 1630           */
1591 1631          error = secpolicy_fs_mount(cr, mvp, vfsp);
1592 1632          if (error) {
1593 1633                  if (dsl_deleg_access(osname, ZFS_DELEG_PERM_MOUNT, cr) == 0) {
1594 1634                          vattr_t         vattr;
1595 1635  
1596 1636                          /*
1597 1637                           * Make sure user is the owner of the mount point
1598 1638                           * or has sufficient privileges.
1599 1639                           */
1600 1640  
1601 1641                          vattr.va_mask = AT_UID;
1602 1642  
1603 1643                          if (VOP_GETATTR(mvp, &vattr, 0, cr, NULL)) {
1604 1644                                  goto out;
1605 1645                          }
1606 1646  
1607 1647                          if (secpolicy_vnode_owner(cr, vattr.va_uid) != 0 &&
1608 1648                              VOP_ACCESS(mvp, VWRITE, 0, cr, NULL) != 0) {
1609 1649                                  goto out;
1610 1650                          }
1611 1651                          secpolicy_fs_mount_clearopts(cr, vfsp);
1612 1652                  } else {
1613 1653                          goto out;
1614 1654                  }
1615 1655          }
1616 1656  
1617 1657          /*
1618 1658           * Refuse to mount a filesystem if we are in a local zone and the
1619 1659           * dataset is not visible.
1620 1660           */
1621 1661          if (!INGLOBALZONE(curproc) &&
1622 1662              (!zone_dataset_visible(osname, &canwrite) || !canwrite)) {
1623 1663                  error = SET_ERROR(EPERM);
1624 1664                  goto out;
1625 1665          }
1626 1666  
1627 1667          error = zfs_mount_label_policy(vfsp, osname);
1628 1668          if (error)
1629 1669                  goto out;
1630 1670  
1631 1671          /*
1632 1672           * When doing a remount, we simply refresh our temporary properties
1633 1673           * according to those options set in the current VFS options.
1634 1674           */
1635 1675          if (uap->flags & MS_REMOUNT) {
1636 1676                  /* refresh mount options */
1637 1677                  zfs_unregister_callbacks(vfsp->vfs_data);
1638 1678                  error = zfs_register_callbacks(vfsp);
1639 1679                  goto out;
1640 1680          }
1641 1681  
1642 1682          error = zfs_domount(vfsp, osname);
1643 1683  
1644 1684          /*
1645 1685           * Add an extra VFS_HOLD on our parent vfs so that it can't
1646 1686           * disappear due to a forced unmount.
1647 1687           */
1648 1688          if (error == 0 && ((zfsvfs_t *)vfsp->vfs_data)->z_issnap)
1649 1689                  VFS_HOLD(mvp->v_vfsp);
1650 1690  
1651 1691  out:
1652 1692          pn_free(&spn);
1653 1693          return (error);
1654 1694  }
1655 1695  
1656 1696  static int
1657 1697  zfs_statvfs(vfs_t *vfsp, struct statvfs64 *statp)
1658 1698  {
1659 1699          zfsvfs_t *zfsvfs = vfsp->vfs_data;
1660 1700          dev32_t d32;
1661 1701          uint64_t refdbytes, availbytes, usedobjs, availobjs;
1662 1702  
1663 1703          ZFS_ENTER(zfsvfs);
1664 1704  
1665 1705          dmu_objset_space(zfsvfs->z_os,
1666 1706              &refdbytes, &availbytes, &usedobjs, &availobjs);
1667 1707  
1668 1708          /*
1669 1709           * The underlying storage pool actually uses multiple block sizes.
1670 1710           * We report the fragsize as the smallest block size we support,
1671 1711           * and we report our blocksize as the filesystem's maximum blocksize.
1672 1712           */
1673 1713          statp->f_frsize = 1UL << SPA_MINBLOCKSHIFT;
1674 1714          statp->f_bsize = zfsvfs->z_max_blksz;
1675 1715  
1676 1716          /*
1677 1717           * The following report "total" blocks of various kinds in the
1678 1718           * file system, but reported in terms of f_frsize - the
1679 1719           * "fragment" size.
1680 1720           */
1681 1721  
1682 1722          statp->f_blocks = (refdbytes + availbytes) >> SPA_MINBLOCKSHIFT;
1683 1723          statp->f_bfree = availbytes >> SPA_MINBLOCKSHIFT;
1684 1724          statp->f_bavail = statp->f_bfree; /* no root reservation */
1685 1725  
1686 1726          /*
1687 1727           * statvfs() should really be called statufs(), because it assumes
1688 1728           * static metadata.  ZFS doesn't preallocate files, so the best
1689 1729           * we can do is report the max that could possibly fit in f_files,
1690 1730           * and that minus the number actually used in f_ffree.
1691 1731           * For f_ffree, report the smaller of the number of object available
1692 1732           * and the number of blocks (each object will take at least a block).
1693 1733           */
1694 1734          statp->f_ffree = MIN(availobjs, statp->f_bfree);
1695 1735          statp->f_favail = statp->f_ffree;       /* no "root reservation" */
1696 1736          statp->f_files = statp->f_ffree + usedobjs;
1697 1737  
1698 1738          (void) cmpldev(&d32, vfsp->vfs_dev);
1699 1739          statp->f_fsid = d32;
1700 1740  
1701 1741          /*
1702 1742           * We're a zfs filesystem.
1703 1743           */
1704 1744          (void) strcpy(statp->f_basetype, vfssw[vfsp->vfs_fstype].vsw_name);
1705 1745  
1706 1746          statp->f_flag = vf_to_stf(vfsp->vfs_flag);
1707 1747  
1708 1748          statp->f_namemax = MAXNAMELEN - 1;
1709 1749  
1710 1750          /*
1711 1751           * We have all of 32 characters to stuff a string here.
1712 1752           * Is there anything useful we could/should provide?
1713 1753           */
1714 1754          bzero(statp->f_fstr, sizeof (statp->f_fstr));
1715 1755  
1716 1756          ZFS_EXIT(zfsvfs);
1717 1757          return (0);
1718 1758  }
1719 1759  
1720 1760  static int
1721 1761  zfs_root(vfs_t *vfsp, vnode_t **vpp)
1722 1762  {
1723 1763          zfsvfs_t *zfsvfs = vfsp->vfs_data;
1724 1764          znode_t *rootzp;
1725 1765          int error;
1726 1766  
1727 1767          ZFS_ENTER(zfsvfs);
1728 1768  
1729 1769          error = zfs_zget(zfsvfs, zfsvfs->z_root, &rootzp);
1730 1770          if (error == 0)
1731 1771                  *vpp = ZTOV(rootzp);
1732 1772  
1733 1773          ZFS_EXIT(zfsvfs);
1734 1774          return (error);
1735 1775  }
1736 1776  
1737 1777  /*

↓ open down ↓

546 lines elided

↑ open up ↑

1738 1778   * Teardown the zfsvfs::z_os.
1739 1779   *
1740 1780   * Note, if 'unmounting' is FALSE, we return with the 'z_teardown_lock'
1741 1781   * and 'z_teardown_inactive_lock' held.
1742 1782   */
1743 1783  static int
1744 1784  zfsvfs_teardown(zfsvfs_t *zfsvfs, boolean_t unmounting)
1745 1785  {
1746 1786          znode_t *zp;
1747 1787  
     1788 +        zfs_unlinked_drain_stop_wait(zfsvfs);
1748 1789          rrm_enter(&zfsvfs->z_teardown_lock, RW_WRITER, FTAG);
1749 1790  
1750 1791          if (!unmounting) {
1751 1792                  /*
1752 1793                   * We purge the parent filesystem's vfsp as the parent
1753 1794                   * filesystem and all of its snapshots have their vnode's
1754 1795                   * v_vfsp set to the parent's filesystem's vfsp.  Note,
1755 1796                   * 'z_parent' is self referential for non-snapshots.
1756 1797                   */
1757 1798                  (void) dnlc_purge_vfsp(zfsvfs->z_parent->z_vfs, 0);

1758 1799          }
1759 1800  
1760 1801          /*
1761 1802           * Close the zil. NB: Can't close the zil while zfs_inactive
1762 1803           * threads are blocked as zil_close can call zfs_inactive.
1763 1804           */
1764 1805          if (zfsvfs->z_log) {
1765 1806                  zil_close(zfsvfs->z_log);
1766 1807                  zfsvfs->z_log = NULL;
1767 1808          }
1768 1809  
1769 1810          rw_enter(&zfsvfs->z_teardown_inactive_lock, RW_WRITER);
1770 1811  
1771 1812          /*
1772 1813           * If we are not unmounting (ie: online recv) and someone already
1773 1814           * unmounted this file system while we were doing the switcheroo,
1774 1815           * or a reopen of z_os failed then just bail out now.
1775 1816           */
1776 1817          if (!unmounting && (zfsvfs->z_unmounted || zfsvfs->z_os == NULL)) {
1777 1818                  rw_exit(&zfsvfs->z_teardown_inactive_lock);
1778 1819                  rrm_exit(&zfsvfs->z_teardown_lock, FTAG);
1779 1820                  return (SET_ERROR(EIO));
1780 1821          }
1781 1822  
1782 1823          /*
1783 1824           * At this point there are no vops active, and any new vops will
1784 1825           * fail with EIO since we have z_teardown_lock for writer (only
1785 1826           * relavent for forced unmount).
1786 1827           *
1787 1828           * Release all holds on dbufs.
1788 1829           */
1789 1830          mutex_enter(&zfsvfs->z_znodes_lock);
1790 1831          for (zp = list_head(&zfsvfs->z_all_znodes); zp != NULL;
1791 1832              zp = list_next(&zfsvfs->z_all_znodes, zp))
1792 1833                  if (zp->z_sa_hdl) {
1793 1834                          ASSERT(ZTOV(zp)->v_count > 0);
1794 1835                          zfs_znode_dmu_fini(zp);
1795 1836                  }
1796 1837          mutex_exit(&zfsvfs->z_znodes_lock);
1797 1838  
1798 1839          /*
1799 1840           * If we are unmounting, set the unmounted flag and let new vops
1800 1841           * unblock.  zfs_inactive will have the unmounted behavior, and all
1801 1842           * other vops will fail with EIO.
1802 1843           */
1803 1844          if (unmounting) {
1804 1845                  zfsvfs->z_unmounted = B_TRUE;
1805 1846                  rw_exit(&zfsvfs->z_teardown_inactive_lock);
1806 1847                  rrm_exit(&zfsvfs->z_teardown_lock, FTAG);
1807 1848          }
1808 1849  
1809 1850          /*
1810 1851           * z_os will be NULL if there was an error in attempting to reopen
1811 1852           * zfsvfs, so just return as the properties had already been
1812 1853           * unregistered and cached data had been evicted before.
1813 1854           */
1814 1855          if (zfsvfs->z_os == NULL)
1815 1856                  return (0);
1816 1857  
1817 1858          /*

↓ open down ↓

60 lines elided

↑ open up ↑

1818 1859           * Unregister properties.
1819 1860           */
1820 1861          zfs_unregister_callbacks(zfsvfs);
1821 1862  
1822 1863          /*
1823 1864           * Evict cached data
1824 1865           */
1825 1866          if (dsl_dataset_is_dirty(dmu_objset_ds(zfsvfs->z_os)) &&
1826 1867              !(zfsvfs->z_vfs->vfs_flag & VFS_RDONLY))
1827 1868                  txg_wait_synced(dmu_objset_pool(zfsvfs->z_os), 0);
1828      -        dmu_objset_evict_dbufs(zfsvfs->z_os);
     1869 +        (void) dmu_objset_evict_dbufs(zfsvfs->z_os);
1829 1870  
1830 1871          return (0);
1831 1872  }
1832 1873  
1833 1874  /*ARGSUSED*/
1834 1875  static int
1835 1876  zfs_umount(vfs_t *vfsp, int fflag, cred_t *cr)
1836 1877  {
1837 1878          zfsvfs_t *zfsvfs = vfsp->vfs_data;
1838 1879          objset_t *os;

1839 1880          int ret;
1840 1881  
1841 1882          ret = secpolicy_fs_unmount(cr, vfsp);
1842 1883          if (ret) {
1843 1884                  if (dsl_deleg_access((char *)refstr_value(vfsp->vfs_resource),
1844 1885                      ZFS_DELEG_PERM_MOUNT, cr))
1845 1886                          return (ret);
1846 1887          }
1847 1888  
1848 1889          /*
1849 1890           * We purge the parent filesystem's vfsp as the parent filesystem
1850 1891           * and all of its snapshots have their vnode's v_vfsp set to the
1851 1892           * parent's filesystem's vfsp.  Note, 'z_parent' is self
1852 1893           * referential for non-snapshots.
1853 1894           */
1854 1895          (void) dnlc_purge_vfsp(zfsvfs->z_parent->z_vfs, 0);
1855 1896

↓ open down ↓

17 lines elided

↑ open up ↑

1856 1897          /*
1857 1898           * Unmount any snapshots mounted under .zfs before unmounting the
1858 1899           * dataset itself.
1859 1900           */
1860 1901          if (zfsvfs->z_ctldir != NULL &&
1861 1902              (ret = zfsctl_umount_snapshots(vfsp, fflag, cr)) != 0) {
1862 1903                  return (ret);
1863 1904          }
1864 1905  
1865 1906          if (!(fflag & MS_FORCE)) {
     1907 +                uint_t active_vnodes;
     1908 +
1866 1909                  /*
1867 1910                   * Check the number of active vnodes in the file system.
1868 1911                   * Our count is maintained in the vfs structure, but the
1869 1912                   * number is off by 1 to indicate a hold on the vfs
1870 1913                   * structure itself.
1871 1914                   *
1872 1915                   * The '.zfs' directory maintains a reference of its
1873 1916                   * own, and any active references underneath are
1874 1917                   * reflected in the vnode count.
     1918 +                 *
     1919 +                 * Active vnodes: vnodes that were held by an user
1875 1920                   */
     1921 +
     1922 +                active_vnodes =
     1923 +                    vfsp->vfs_count - zfsvfs->z_znodes_freeing_cnt;
     1924 +
1876 1925                  if (zfsvfs->z_ctldir == NULL) {
1877      -                        if (vfsp->vfs_count > 1)
     1926 +                        if (active_vnodes > 1)
1878 1927                                  return (SET_ERROR(EBUSY));
1879 1928                  } else {
1880      -                        if (vfsp->vfs_count > 2 ||
     1929 +                        if (active_vnodes > 2 ||
1881 1930                              zfsvfs->z_ctldir->v_count > 1)
1882 1931                                  return (SET_ERROR(EBUSY));
1883 1932                  }
1884 1933          }
1885 1934  
1886 1935          vfsp->vfs_flag |= VFS_UNMOUNTED;
1887 1936  
1888 1937          VERIFY(zfsvfs_teardown(zfsvfs, B_TRUE) == 0);
1889 1938          os = zfsvfs->z_os;
1890 1939

1891 1940          /*
1892 1941           * z_os will be NULL if there was an error in
1893 1942           * attempting to reopen zfsvfs.
1894 1943           */
1895 1944          if (os != NULL) {
1896 1945                  /*
1897 1946                   * Unset the objset user_ptr.
1898 1947                   */
1899 1948                  mutex_enter(&os->os_user_ptr_lock);
1900 1949                  dmu_objset_set_user(os, NULL);
1901 1950                  mutex_exit(&os->os_user_ptr_lock);
1902 1951  
1903 1952                  /*
1904 1953                   * Finally release the objset
1905 1954                   */
1906 1955                  dmu_objset_disown(os, zfsvfs);
1907 1956          }
1908 1957  
1909 1958          /*
1910 1959           * We can now safely destroy the '.zfs' directory node.
1911 1960           */
1912 1961          if (zfsvfs->z_ctldir != NULL)
1913 1962                  zfsctl_destroy(zfsvfs);
1914 1963  
1915 1964          return (0);
1916 1965  }
1917 1966  
1918 1967  static int
1919 1968  zfs_vget(vfs_t *vfsp, vnode_t **vpp, fid_t *fidp)
1920 1969  {
1921 1970          zfsvfs_t        *zfsvfs = vfsp->vfs_data;
1922 1971          znode_t         *zp;
1923 1972          uint64_t        object = 0;
1924 1973          uint64_t        fid_gen = 0;
1925 1974          uint64_t        gen_mask;
1926 1975          uint64_t        zp_gen;
1927 1976          int             i, err;
1928 1977  
1929 1978          *vpp = NULL;
1930 1979  
1931 1980          ZFS_ENTER(zfsvfs);
1932 1981  
1933 1982          if (fidp->fid_len == LONG_FID_LEN) {
1934 1983                  zfid_long_t     *zlfid = (zfid_long_t *)fidp;
1935 1984                  uint64_t        objsetid = 0;
1936 1985                  uint64_t        setgen = 0;
1937 1986  
1938 1987                  for (i = 0; i < sizeof (zlfid->zf_setid); i++)
1939 1988                          objsetid |= ((uint64_t)zlfid->zf_setid[i]) << (8 * i);
1940 1989  
1941 1990                  for (i = 0; i < sizeof (zlfid->zf_setgen); i++)
1942 1991                          setgen |= ((uint64_t)zlfid->zf_setgen[i]) << (8 * i);
1943 1992  
1944 1993                  ZFS_EXIT(zfsvfs);
1945 1994  
1946 1995                  err = zfsctl_lookup_objset(vfsp, objsetid, &zfsvfs);
1947 1996                  if (err)
1948 1997                          return (SET_ERROR(EINVAL));
1949 1998                  ZFS_ENTER(zfsvfs);
1950 1999          }
1951 2000  
1952 2001          if (fidp->fid_len == SHORT_FID_LEN || fidp->fid_len == LONG_FID_LEN) {
1953 2002                  zfid_short_t    *zfid = (zfid_short_t *)fidp;
1954 2003  
1955 2004                  for (i = 0; i < sizeof (zfid->zf_object); i++)
1956 2005                          object |= ((uint64_t)zfid->zf_object[i]) << (8 * i);
1957 2006  
1958 2007                  for (i = 0; i < sizeof (zfid->zf_gen); i++)
1959 2008                          fid_gen |= ((uint64_t)zfid->zf_gen[i]) << (8 * i);
1960 2009          } else {
1961 2010                  ZFS_EXIT(zfsvfs);
1962 2011                  return (SET_ERROR(EINVAL));
1963 2012          }
1964 2013  
1965 2014          /* A zero fid_gen means we are in the .zfs control directories */
1966 2015          if (fid_gen == 0 &&
1967 2016              (object == ZFSCTL_INO_ROOT || object == ZFSCTL_INO_SNAPDIR)) {
1968 2017                  *vpp = zfsvfs->z_ctldir;
1969 2018                  ASSERT(*vpp != NULL);
1970 2019                  if (object == ZFSCTL_INO_SNAPDIR) {
1971 2020                          VERIFY(zfsctl_root_lookup(*vpp, "snapshot", vpp, NULL,
1972 2021                              0, NULL, NULL, NULL, NULL, NULL) == 0);
1973 2022                  } else {
1974 2023                          VN_HOLD(*vpp);
1975 2024                  }
1976 2025                  ZFS_EXIT(zfsvfs);
1977 2026                  return (0);
1978 2027          }
1979 2028  
1980 2029          gen_mask = -1ULL >> (64 - 8 * i);
1981 2030  
1982 2031          dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask);
1983 2032          if (err = zfs_zget(zfsvfs, object, &zp)) {
1984 2033                  ZFS_EXIT(zfsvfs);
1985 2034                  return (err);
1986 2035          }
1987 2036          (void) sa_lookup(zp->z_sa_hdl, SA_ZPL_GEN(zfsvfs), &zp_gen,
1988 2037              sizeof (uint64_t));
1989 2038          zp_gen = zp_gen & gen_mask;
1990 2039          if (zp_gen == 0)
1991 2040                  zp_gen = 1;
1992 2041          if (zp->z_unlinked || zp_gen != fid_gen) {
1993 2042                  dprintf("znode gen (%u) != fid gen (%u)\n", zp_gen, fid_gen);
1994 2043                  VN_RELE(ZTOV(zp));
1995 2044                  ZFS_EXIT(zfsvfs);
1996 2045                  return (SET_ERROR(EINVAL));
1997 2046          }
1998 2047  
1999 2048          *vpp = ZTOV(zp);
2000 2049          ZFS_EXIT(zfsvfs);
2001 2050          return (0);
2002 2051  }
2003 2052  
2004 2053  /*
2005 2054   * Block out VOPs and close zfsvfs_t::z_os
2006 2055   *

↓ open down ↓

116 lines elided

↑ open up ↑

2007 2056   * Note, if successful, then we return with the 'z_teardown_lock' and
2008 2057   * 'z_teardown_inactive_lock' write held.  We leave ownership of the underlying
2009 2058   * dataset and objset intact so that they can be atomically handed off during
2010 2059   * a subsequent rollback or recv operation and the resume thereafter.
2011 2060   */
2012 2061  int
2013 2062  zfs_suspend_fs(zfsvfs_t *zfsvfs)
2014 2063  {
2015 2064          int error;
2016 2065  
2017      -        if ((error = zfsvfs_teardown(zfsvfs, B_FALSE)) != 0)
     2066 +        mutex_enter(&zfsvfs->z_lock);
     2067 +        if (zfsvfs->z_busy) {
     2068 +                mutex_exit(&zfsvfs->z_lock);
     2069 +                return (SET_ERROR(EBUSY));
     2070 +        }
     2071 +        zfsvfs->z_busy = B_TRUE;
     2072 +        mutex_exit(&zfsvfs->z_lock);
     2073 +
     2074 +        if ((error = zfsvfs_teardown(zfsvfs, B_FALSE)) != 0) {
     2075 +                mutex_enter(&zfsvfs->z_lock);
     2076 +                zfsvfs->z_busy = B_FALSE;
     2077 +                mutex_exit(&zfsvfs->z_lock);
2018 2078                  return (error);
     2079 +        }
2019 2080  
2020 2081          return (0);
2021 2082  }
2022 2083  
2023 2084  /*
2024 2085   * Rebuild SA and release VOPs.  Note that ownership of the underlying dataset
2025 2086   * is an invariant across any of the operations that can be performed while the
2026 2087   * filesystem was suspended.  Whether it succeeded or failed, the preconditions
2027 2088   * are the same: the relevant objset and associated dataset are owned by
2028 2089   * zfsvfs, held, and long held on entry.

2029 2090   */
2030 2091  int
2031 2092  zfs_resume_fs(zfsvfs_t *zfsvfs, dsl_dataset_t *ds)
2032 2093  {
2033 2094          int err;
2034 2095          znode_t *zp;
2035 2096  
2036 2097          ASSERT(RRM_WRITE_HELD(&zfsvfs->z_teardown_lock));
2037 2098          ASSERT(RW_WRITE_HELD(&zfsvfs->z_teardown_inactive_lock));
2038 2099  
2039 2100          /*
2040 2101           * We already own this, so just update the objset_t, as the one we
2041 2102           * had before may have been evicted.
2042 2103           */
2043 2104          objset_t *os;
2044 2105          VERIFY3P(ds->ds_owner, ==, zfsvfs);
2045 2106          VERIFY(dsl_dataset_long_held(ds));
2046 2107          VERIFY0(dmu_objset_from_ds(ds, &os));
2047 2108  
2048 2109          err = zfsvfs_init(zfsvfs, os);
2049 2110          if (err != 0)
2050 2111                  goto bail;
2051 2112  
2052 2113          VERIFY(zfsvfs_setup(zfsvfs, B_FALSE) == 0);
2053 2114  
2054 2115          zfs_set_fuid_feature(zfsvfs);
2055 2116  
2056 2117          /*
2057 2118           * Attempt to re-establish all the active znodes with
2058 2119           * their dbufs.  If a zfs_rezget() fails, then we'll let

↓ open down ↓

30 lines elided

↑ open up ↑

2059 2120           * any potential callers discover that via ZFS_ENTER_VERIFY_VP
2060 2121           * when they try to use their znode.
2061 2122           */
2062 2123          mutex_enter(&zfsvfs->z_znodes_lock);
2063 2124          for (zp = list_head(&zfsvfs->z_all_znodes); zp;
2064 2125              zp = list_next(&zfsvfs->z_all_znodes, zp)) {
2065 2126                  (void) zfs_rezget(zp);
2066 2127          }
2067 2128          mutex_exit(&zfsvfs->z_znodes_lock);
2068 2129  
     2130 +        if (((zfsvfs->z_vfs->vfs_flag & VFS_RDONLY) == 0) &&
     2131 +            !zfsvfs->z_unmounted) {
     2132 +                /*
     2133 +                 * zfs_suspend_fs() could have interrupted freeing
     2134 +                 * of dnodes. We need to restart this freeing so
     2135 +                 * that we don't "leak" the space.
     2136 +                 */
     2137 +                zfs_unlinked_drain(zfsvfs);
     2138 +        }
     2139 +
2069 2140  bail:
2070 2141          /* release the VOPs */
2071 2142          rw_exit(&zfsvfs->z_teardown_inactive_lock);
2072 2143          rrm_exit(&zfsvfs->z_teardown_lock, FTAG);
2073 2144  
2074 2145          if (err) {
2075 2146                  /*
2076 2147                   * Since we couldn't setup the sa framework, try to force
2077 2148                   * unmount this file system.
2078 2149                   */
2079 2150                  if (vn_vfswlock(zfsvfs->z_vfs->vfs_vnodecovered) == 0)
2080 2151                          (void) dounmount(zfsvfs->z_vfs, MS_FORCE, CRED());
2081 2152          }
     2153 +        mutex_enter(&zfsvfs->z_lock);
     2154 +        zfsvfs->z_busy = B_FALSE;
     2155 +        mutex_exit(&zfsvfs->z_lock);
     2156 +
2082 2157          return (err);
2083 2158  }
2084 2159  
2085 2160  static void
2086 2161  zfs_freevfs(vfs_t *vfsp)
2087 2162  {
2088 2163          zfsvfs_t *zfsvfs = vfsp->vfs_data;
2089 2164  
2090 2165          /*
2091 2166           * If this is a snapshot, we have an extra VFS_HOLD on our parent

2092 2167           * from zfs_mount().  Release it here.  If we came through
2093 2168           * zfs_mountroot() instead, we didn't grab an extra hold, so
2094 2169           * skip the VFS_RELE for rootvfs.
2095 2170           */
2096 2171          if (zfsvfs->z_issnap && (vfsp != rootvfs))
2097 2172                  VFS_RELE(zfsvfs->z_parent->z_vfs);
2098 2173  
2099 2174          zfsvfs_free(zfsvfs);
2100 2175  
2101 2176          atomic_dec_32(&zfs_active_fs_count);
2102 2177  }
2103 2178  
2104 2179  /*
2105 2180   * VFS_INIT() initialization.  Note that there is no VFS_FINI(),
2106 2181   * so we can't safely do any non-idempotent initialization here.
2107 2182   * Leave that to zfs_init() and zfs_fini(), which are called
2108 2183   * from the module's _init() and _fini() entry points.
2109 2184   */
2110 2185  /*ARGSUSED*/
2111 2186  static int
2112 2187  zfs_vfsinit(int fstype, char *name)
2113 2188  {
2114 2189          int error;
2115 2190  
2116 2191          zfsfstype = fstype;
2117 2192  
2118 2193          /*
2119 2194           * Setup vfsops and vnodeops tables.
2120 2195           */
2121 2196          error = vfs_setfsops(fstype, zfs_vfsops_template, &zfs_vfsops);
2122 2197          if (error != 0) {
2123 2198                  cmn_err(CE_WARN, "zfs: bad vfs ops template");
2124 2199          }
2125 2200  
2126 2201          error = zfs_create_op_tables();
2127 2202          if (error) {
2128 2203                  zfs_remove_op_tables();
2129 2204                  cmn_err(CE_WARN, "zfs: bad vnode ops template");
2130 2205                  (void) vfs_freevfsops_by_type(zfsfstype);
2131 2206                  return (error);
2132 2207          }
2133 2208  
2134 2209          mutex_init(&zfs_dev_mtx, NULL, MUTEX_DEFAULT, NULL);
2135 2210  
2136 2211          /*
2137 2212           * Unique major number for all zfs mounts.
2138 2213           * If we run out of 32-bit minors, we'll getudev() another major.
2139 2214           */
2140 2215          zfs_major = ddi_name_to_major(ZFS_DRIVER);
2141 2216          zfs_minor = ZFS_MIN_MINOR;
2142 2217  
2143 2218          return (0);
2144 2219  }
2145 2220  
2146 2221  void
2147 2222  zfs_init(void)
2148 2223  {
2149 2224          /*
2150 2225           * Initialize .zfs directory structures
2151 2226           */
2152 2227          zfsctl_init();
2153 2228  
2154 2229          /*
2155 2230           * Initialize znode cache, vnode ops, etc...
2156 2231           */
2157 2232          zfs_znode_init();
2158 2233  
2159 2234          dmu_objset_register_type(DMU_OST_ZFS, zfs_space_delta_cb);
2160 2235  }
2161 2236  
2162 2237  void
2163 2238  zfs_fini(void)
2164 2239  {
2165 2240          zfsctl_fini();
2166 2241          zfs_znode_fini();
2167 2242  }
2168 2243  
2169 2244  int
2170 2245  zfs_busy(void)
2171 2246  {
2172 2247          return (zfs_active_fs_count != 0);
2173 2248  }
2174 2249  
2175 2250  int
2176 2251  zfs_set_version(zfsvfs_t *zfsvfs, uint64_t newvers)
2177 2252  {
2178 2253          int error;
2179 2254          objset_t *os = zfsvfs->z_os;
2180 2255          dmu_tx_t *tx;
2181 2256  
2182 2257          if (newvers < ZPL_VERSION_INITIAL || newvers > ZPL_VERSION)
2183 2258                  return (SET_ERROR(EINVAL));
2184 2259  
2185 2260          if (newvers < zfsvfs->z_version)
2186 2261                  return (SET_ERROR(EINVAL));
2187 2262  
2188 2263          if (zfs_spa_version_map(newvers) >
2189 2264              spa_version(dmu_objset_spa(zfsvfs->z_os)))
2190 2265                  return (SET_ERROR(ENOTSUP));
2191 2266  
2192 2267          tx = dmu_tx_create(os);
2193 2268          dmu_tx_hold_zap(tx, MASTER_NODE_OBJ, B_FALSE, ZPL_VERSION_STR);
2194 2269          if (newvers >= ZPL_VERSION_SA && !zfsvfs->z_use_sa) {
2195 2270                  dmu_tx_hold_zap(tx, MASTER_NODE_OBJ, B_TRUE,
2196 2271                      ZFS_SA_ATTRS);
2197 2272                  dmu_tx_hold_zap(tx, DMU_NEW_OBJECT, FALSE, NULL);
2198 2273          }
2199 2274          error = dmu_tx_assign(tx, TXG_WAIT);
2200 2275          if (error) {
2201 2276                  dmu_tx_abort(tx);
2202 2277                  return (error);
2203 2278          }
2204 2279  
2205 2280          error = zap_update(os, MASTER_NODE_OBJ, ZPL_VERSION_STR,
2206 2281              8, 1, &newvers, tx);
2207 2282  
2208 2283          if (error) {
2209 2284                  dmu_tx_commit(tx);
2210 2285                  return (error);
2211 2286          }
2212 2287  
2213 2288          if (newvers >= ZPL_VERSION_SA && !zfsvfs->z_use_sa) {
2214 2289                  uint64_t sa_obj;
2215 2290  
2216 2291                  ASSERT3U(spa_version(dmu_objset_spa(zfsvfs->z_os)), >=,
2217 2292                      SPA_VERSION_SA);
2218 2293                  sa_obj = zap_create(os, DMU_OT_SA_MASTER_NODE,
2219 2294                      DMU_OT_NONE, 0, tx);
2220 2295  
2221 2296                  error = zap_add(os, MASTER_NODE_OBJ,
2222 2297                      ZFS_SA_ATTRS, 8, 1, &sa_obj, tx);
2223 2298                  ASSERT0(error);
2224 2299

↓ open down ↓

133 lines elided

↑ open up ↑

2225 2300                  VERIFY(0 == sa_set_sa_object(os, sa_obj));
2226 2301                  sa_register_update_callback(os, zfs_sa_upgrade);
2227 2302          }
2228 2303  
2229 2304          spa_history_log_internal_ds(dmu_objset_ds(os), "upgrade", tx,
2230 2305              "from %llu to %llu", zfsvfs->z_version, newvers);
2231 2306  
2232 2307          dmu_tx_commit(tx);
2233 2308  
2234 2309          zfsvfs->z_version = newvers;
     2310 +        os->os_version = newvers;
2235 2311  
2236 2312          zfs_set_fuid_feature(zfsvfs);
2237 2313  
2238 2314          return (0);
2239 2315  }
2240 2316  
2241 2317  /*
2242 2318   * Read a property stored within the master node.
2243 2319   */
2244 2320  int
2245 2321  zfs_get_zplprop(objset_t *os, zfs_prop_t prop, uint64_t *value)
2246 2322  {
2247      -        const char *pname;
2248      -        int error = ENOENT;
     2323 +        uint64_t *cached_copy = NULL;
2249 2324  
2250 2325          /*
2251      -         * Look up the file system's value for the property.  For the
2252      -         * version property, we look up a slightly different string.
     2326 +         * Figure out where in the objset_t the cached copy would live, if it
     2327 +         * is available for the requested property.
2253 2328           */
2254      -        if (prop == ZFS_PROP_VERSION)
     2329 +        if (os != NULL) {
     2330 +                switch (prop) {
     2331 +                case ZFS_PROP_VERSION:
     2332 +                        cached_copy = &os->os_version;
     2333 +                        break;
     2334 +                case ZFS_PROP_NORMALIZE:
     2335 +                        cached_copy = &os->os_normalization;
     2336 +                        break;
     2337 +                case ZFS_PROP_UTF8ONLY:
     2338 +                        cached_copy = &os->os_utf8only;
     2339 +                        break;
     2340 +                case ZFS_PROP_CASE:
     2341 +                        cached_copy = &os->os_casesensitivity;
     2342 +                        break;
     2343 +                default:
     2344 +                        break;
     2345 +                }
     2346 +        }
     2347 +        if (cached_copy != NULL && *cached_copy != OBJSET_PROP_UNINITIALIZED) {
     2348 +                *value = *cached_copy;
     2349 +                return (0);
     2350 +        }
     2351 +
     2352 +        /*
     2353 +         * If the property wasn't cached, look up the file system's value for
     2354 +         * the property. For the version property, we look up a slightly
     2355 +         * different string.
     2356 +         */
     2357 +        const char *pname;
     2358 +        int error = ENOENT;
     2359 +        if (prop == ZFS_PROP_VERSION) {
2255 2360                  pname = ZPL_VERSION_STR;
2256      -        else
     2361 +        } else {
2257 2362                  pname = zfs_prop_to_name(prop);
     2363 +        }
2258 2364  
2259 2365          if (os != NULL) {
2260 2366                  ASSERT3U(os->os_phys->os_type, ==, DMU_OST_ZFS);
2261 2367                  error = zap_lookup(os, MASTER_NODE_OBJ, pname, 8, 1, value);
2262 2368          }
2263 2369  
2264 2370          if (error == ENOENT) {
2265 2371                  /* No value set, use the default value */
2266 2372                  switch (prop) {
2267 2373                  case ZFS_PROP_VERSION:

2268 2374                          *value = ZPL_VERSION;
2269 2375                          break;
2270 2376                  case ZFS_PROP_NORMALIZE:
2271 2377                  case ZFS_PROP_UTF8ONLY:

↓ open down ↓

4 lines elided

↑ open up ↑

2272 2378                          *value = 0;
2273 2379                          break;
2274 2380                  case ZFS_PROP_CASE:
2275 2381                          *value = ZFS_CASE_SENSITIVE;
2276 2382                          break;
2277 2383                  default:
2278 2384                          return (error);
2279 2385                  }
2280 2386                  error = 0;
2281 2387          }
     2388 +
     2389 +        /*
     2390 +         * If one of the methods for getting the property value above worked,
     2391 +         * copy it into the objset_t's cache.
     2392 +         */
     2393 +        if (error == 0 && cached_copy != NULL) {
     2394 +                *cached_copy = *value;
     2395 +        }
     2396 +
2282 2397          return (error);
2283 2398  }
2284 2399  
2285 2400  /*
2286 2401   * Return true if the coresponding vfs's unmounted flag is set.
2287 2402   * Otherwise return false.
2288 2403   * If this function returns true we know VFS unmount has been initiated.
2289 2404   */
2290 2405  boolean_t
2291 2406  zfs_get_vfs_flag_unmounted(objset_t *os)

2292 2407  {
2293 2408          zfsvfs_t *zfvp;
2294 2409          boolean_t unmounted = B_FALSE;
2295 2410  
2296 2411          ASSERT(dmu_objset_type(os) == DMU_OST_ZFS);
2297 2412  
2298 2413          mutex_enter(&os->os_user_ptr_lock);
2299 2414          zfvp = dmu_objset_get_user(os);
2300 2415          if (zfvp != NULL && zfvp->z_vfs != NULL &&
2301 2416              (zfvp->z_vfs->vfs_flag & VFS_UNMOUNTED))
2302 2417                  unmounted = B_TRUE;
2303 2418          mutex_exit(&os->os_user_ptr_lock);
2304 2419  
2305 2420          return (unmounted);
2306 2421  }
2307 2422  
2308 2423  static vfsdef_t vfw = {
2309 2424          VFSDEF_VERSION,
2310 2425          MNTTYPE_ZFS,
2311 2426          zfs_vfsinit,
2312 2427          VSW_HASPROTO|VSW_CANRWRO|VSW_CANREMOUNT|VSW_VOLATILEDEV|VSW_STATS|
2313 2428              VSW_XID|VSW_ZMOUNT,
2314 2429          &zfs_mntopts
2315 2430  };
2316 2431  
2317 2432  struct modlfs zfs_modlfs = {
2318 2433          &mod_fsops, "ZFS filesystem version " SPA_VERSION_STRING, &vfw
2319 2434  };

↓ open down ↓

28 lines elided

↑ open up ↑

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX