big-one Wdiff usr/src/uts/common/fs/zfs/dmu_objset.c

Print this page

NEX-19394 backport 9337 zfs get all is slow due to uncached metadata
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
 Conflicts:
  usr/src/uts/common/fs/zfs/dbuf.c
  usr/src/uts/common/fs/zfs/dmu.c
  usr/src/uts/common/fs/zfs/sys/dmu_objset.h
NEX-3562 filename normalization doesn't work for removes (sync with upstream)
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5272 KRRP: replicate snapshot properties
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alexey Komarov <alexey.komarov@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-5318 Cleanup specialclass property (obsolete, not used) and fix related meta-to-special case
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5058 WBC: Race between the purging of window and opening new one
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-2830 ZFS smart compression
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-4934 Add capability to remove special vdev
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
6495 Fix mutex leak in dmu_objset_find_dp
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Albert Lee <trisk@omniti.com>
6267 dn_bonus evicted too early
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Xin LI <delphij@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
2605 want to resume interrupted zfs send
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Dan McDonald <danmcd@omniti.com>
6160 /usr/lib/fs/zfs/bootinstall should use bootadm
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Adam Števko <adam.stevko@gmail.com>
Reviewed by: Josef Sipek <jeffpc@josefsipek.net>
Approved by: Richard Lowe <richlowe@richlowe.net>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (NULL is not an int)
6171 dsl_prop_unregister() slows down dataset eviction.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5981 Deadlock in dmu_objset_find_dp
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
5269 zpool import slow
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5610 zfs clone from different source and target pools produces coredump
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4028 use lz4 by default
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-3558 KRRP Integration
SUP-507 Delete or truncate of large files delayed on datasets with small recordsize
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Ilya Usvyatsky <ilya.usvyatsky@nexenta.com>
Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com>
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Issues #7: Reconsile L2ARC and "special" use by datasets
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
re #12619 rb4429 More dp->dp_config_rwlock holds

Split	Close
Expand all
Collapse all

          --- old/usr/src/uts/common/fs/zfs/dmu_objset.c
          +++ new/usr/src/uts/common/fs/zfs/dmu_objset.c

   1    1  /*
   2    2   * CDDL HEADER START
   3    3   *
   4    4   * The contents of this file are subject to the terms of the
   5    5   * Common Development and Distribution License (the "License").
   6    6   * You may not use this file except in compliance with the License.
   7    7   *
   8    8   * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
   9    9   * or http://www.opensolaris.org/os/licensing.
  10   10   * See the License for the specific language governing permissions
  11   11   * and limitations under the License.
  12   12   *
  13   13   * When distributing Covered Code, include this CDDL HEADER in each
  14   14   * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  15   15   * If applicable, add the following below this CDDL HEADER, with the
  16   16   * fields enclosed by brackets "[]" replaced with your own identifying
  17   17   * information: Portions Copyright [yyyy] [name of copyright owner]
  18   18   *
  19   19   * CDDL HEADER END
  20   20   */
  21   21  
  22   22  /*
  23   23   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  24   24   * Copyright (c) 2012, 2017 by Delphix. All rights reserved.
  25   25   * Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
  26   26   * Copyright (c) 2013, Joyent, Inc. All rights reserved.
  27   27   * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
  28   28   * Copyright (c) 2015, STRATO AG, Inc. All rights reserved.
  29   29   * Copyright (c) 2014 Integros [integros.com]
  30   30   * Copyright 2017 Nexenta Systems, Inc.
  31   31   */
  32   32  
  33   33  /* Portions Copyright 2010 Robert Milkowski */
  34   34  
  35   35  #include <sys/cred.h>
  36   36  #include <sys/zfs_context.h>
  37   37  #include <sys/dmu_objset.h>
  38   38  #include <sys/dsl_dir.h>
  39   39  #include <sys/dsl_dataset.h>
  40   40  #include <sys/dsl_prop.h>
  41   41  #include <sys/dsl_pool.h>
  42   42  #include <sys/dsl_synctask.h>
  43   43  #include <sys/dsl_deleg.h>
  44   44  #include <sys/dnode.h>
  45   45  #include <sys/dbuf.h>

↓ open down ↓

45 lines elided

↑ open up ↑

  46   46  #include <sys/zvol.h>
  47   47  #include <sys/dmu_tx.h>
  48   48  #include <sys/zap.h>
  49   49  #include <sys/zil.h>
  50   50  #include <sys/dmu_impl.h>
  51   51  #include <sys/zfs_ioctl.h>
  52   52  #include <sys/sa.h>
  53   53  #include <sys/zfs_onexit.h>
  54   54  #include <sys/dsl_destroy.h>
  55   55  #include <sys/vdev.h>
  56      -#include <sys/zfeature.h>
       56 +#include <sys/wbc.h>
  57   57  
  58   58  /*
  59   59   * Needed to close a window in dnode_move() that allows the objset to be freed
  60   60   * before it can be safely accessed.
  61   61   */
  62   62  krwlock_t os_lock;
  63   63  
       64 +extern kmem_cache_t *zfs_ds_collector_cache;
       65 +
  64   66  /*
  65   67   * Tunable to overwrite the maximum number of threads for the parallization
  66   68   * of dmu_objset_find_dp, needed to speed up the import of pools with many
  67   69   * datasets.
  68   70   * Default is 4 times the number of leaf vdevs.
  69   71   */
  70   72  int dmu_find_threads = 0;
  71   73  
  72   74  /*
  73   75   * Backfill lower metadnode objects after this many have been freed.
  74   76   * Backfilling negatively impacts object creation rates, so only do it
  75   77   * if there are enough holes to fill.
  76   78   */
  77   79  int dmu_rescan_dnode_threshold = 131072;
  78   80  
  79   81  static void dmu_objset_find_dp_cb(void *arg);
  80   82  
       83 +/* ARGSUSED */
       84 +static int
       85 +zfs_ds_collector_constructor(void *ds_el, void *unused, int flags)
       86 +{
       87 +        bzero(ds_el, sizeof (zfs_ds_collector_entry_t));
       88 +        return (0);
       89 +}
       90 +
  81   91  void
  82   92  dmu_objset_init(void)
  83   93  {
       94 +        zfs_ds_collector_cache = kmem_cache_create("zfs_ds_collector_cache",
       95 +            sizeof (zfs_ds_collector_entry_t),
       96 +            8, zfs_ds_collector_constructor,
       97 +            NULL, NULL, NULL, NULL, 0);
  84   98          rw_init(&os_lock, NULL, RW_DEFAULT, NULL);
  85   99  }
  86  100  
  87  101  void
  88  102  dmu_objset_fini(void)
  89  103  {
  90  104          rw_destroy(&os_lock);
      105 +        kmem_cache_destroy(zfs_ds_collector_cache);
  91  106  }
  92  107  
  93  108  spa_t *
  94  109  dmu_objset_spa(objset_t *os)
  95  110  {
  96  111          return (os->os_spa);
  97  112  }
  98  113  
  99  114  zilog_t *
 100  115  dmu_objset_zil(objset_t *os)

 101  116  {
 102  117          return (os->os_zil);
 103  118  }
 104  119  
 105  120  dsl_pool_t *
 106  121  dmu_objset_pool(objset_t *os)
 107  122  {
 108  123          dsl_dataset_t *ds;
 109  124  
 110  125          if ((ds = os->os_dsl_dataset) != NULL && ds->ds_dir)
 111  126                  return (ds->ds_dir->dd_pool);
 112  127          else
 113  128                  return (spa_get_dsl(os->os_spa));
 114  129  }
 115  130  
 116  131  dsl_dataset_t *
 117  132  dmu_objset_ds(objset_t *os)
 118  133  {
 119  134          return (os->os_dsl_dataset);
 120  135  }
 121  136  
 122  137  dmu_objset_type_t
 123  138  dmu_objset_type(objset_t *os)
 124  139  {
 125  140          return (os->os_phys->os_type);
 126  141  }
 127  142  
 128  143  void
 129  144  dmu_objset_name(objset_t *os, char *buf)
 130  145  {
 131  146          dsl_dataset_name(os->os_dsl_dataset, buf);
 132  147  }
 133  148  
 134  149  uint64_t
 135  150  dmu_objset_id(objset_t *os)
 136  151  {
 137  152          dsl_dataset_t *ds = os->os_dsl_dataset;
 138  153  
 139  154          return (ds ? ds->ds_object : 0);
 140  155  }
 141  156  
 142  157  zfs_sync_type_t
 143  158  dmu_objset_syncprop(objset_t *os)
 144  159  {
 145  160          return (os->os_sync);
 146  161  }
 147  162  
 148  163  zfs_logbias_op_t
 149  164  dmu_objset_logbias(objset_t *os)
 150  165  {
 151  166          return (os->os_logbias);
 152  167  }
 153  168  
 154  169  static void
 155  170  checksum_changed_cb(void *arg, uint64_t newval)
 156  171  {
 157  172          objset_t *os = arg;
 158  173  
 159  174          /*
 160  175           * Inheritance should have been done by now.
 161  176           */
 162  177          ASSERT(newval != ZIO_CHECKSUM_INHERIT);
 163  178  
 164  179          os->os_checksum = zio_checksum_select(newval, ZIO_CHECKSUM_ON_VALUE);
 165  180  }
 166  181  
 167  182  static void
 168  183  compression_changed_cb(void *arg, uint64_t newval)
 169  184  {
 170  185          objset_t *os = arg;
 171  186

↓ open down ↓

71 lines elided

↑ open up ↑

 172  187          /*
 173  188           * Inheritance and range checking should have been done by now.
 174  189           */
 175  190          ASSERT(newval != ZIO_COMPRESS_INHERIT);
 176  191  
 177  192          os->os_compress = zio_compress_select(os->os_spa, newval,
 178  193              ZIO_COMPRESS_ON);
 179  194  }
 180  195  
 181  196  static void
      197 +smartcomp_changed_cb(void *arg, uint64_t newval)
      198 +{
      199 +        objset_t *os = arg;
      200 +
      201 +        os->os_smartcomp_enabled = newval ? B_TRUE : B_FALSE;
      202 +}
      203 +
      204 +static void
 182  205  copies_changed_cb(void *arg, uint64_t newval)
 183  206  {
 184  207          objset_t *os = arg;
 185  208  
 186  209          /*
 187  210           * Inheritance and range checking should have been done by now.
 188  211           */
 189  212          ASSERT(newval > 0);
 190  213          ASSERT(newval <= spa_max_replication(os->os_spa));
 191  214

 192  215          os->os_copies = newval;
 193  216  }
 194  217  
 195  218  static void
 196  219  dedup_changed_cb(void *arg, uint64_t newval)
 197  220  {
 198  221          objset_t *os = arg;
 199  222          spa_t *spa = os->os_spa;
 200  223          enum zio_checksum checksum;
 201  224  
 202  225          /*
 203  226           * Inheritance should have been done by now.
 204  227           */
 205  228          ASSERT(newval != ZIO_CHECKSUM_INHERIT);
 206  229  
 207  230          checksum = zio_checksum_dedup_select(spa, newval, ZIO_CHECKSUM_OFF);
 208  231  
 209  232          os->os_dedup_checksum = checksum & ZIO_CHECKSUM_MASK;
 210  233          os->os_dedup_verify = !!(checksum & ZIO_CHECKSUM_VERIFY);
 211  234  }
 212  235  
 213  236  static void
 214  237  primary_cache_changed_cb(void *arg, uint64_t newval)
 215  238  {
 216  239          objset_t *os = arg;
 217  240  
 218  241          /*
 219  242           * Inheritance and range checking should have been done by now.
 220  243           */
 221  244          ASSERT(newval == ZFS_CACHE_ALL || newval == ZFS_CACHE_NONE ||
 222  245              newval == ZFS_CACHE_METADATA);
 223  246  
 224  247          os->os_primary_cache = newval;
 225  248  }

↓ open down ↓

34 lines elided

↑ open up ↑

 226  249  
 227  250  static void
 228  251  secondary_cache_changed_cb(void *arg, uint64_t newval)
 229  252  {
 230  253          objset_t *os = arg;
 231  254  
 232  255          /*
 233  256           * Inheritance and range checking should have been done by now.
 234  257           */
 235  258          ASSERT(newval == ZFS_CACHE_ALL || newval == ZFS_CACHE_NONE ||
 236      -            newval == ZFS_CACHE_METADATA);
      259 +            newval == ZFS_CACHE_METADATA || newval == ZFS_CACHE_DATA);
 237  260  
 238  261          os->os_secondary_cache = newval;
 239  262  }
 240  263  
 241  264  static void
      265 +zpl_meta_placement_changed_cb(void *arg, uint64_t newval)
      266 +{
      267 +        objset_t *os = arg;
      268 +
      269 +        os->os_zpl_meta_to_special = newval;
      270 +}
      271 +
      272 +static void
 242  273  sync_changed_cb(void *arg, uint64_t newval)
 243  274  {
 244  275          objset_t *os = arg;
 245  276  
 246  277          /*
 247  278           * Inheritance and range checking should have been done by now.
 248  279           */
 249  280          ASSERT(newval == ZFS_SYNC_STANDARD || newval == ZFS_SYNC_ALWAYS ||
 250  281              newval == ZFS_SYNC_DISABLED);
 251  282

 252  283          os->os_sync = newval;
 253  284          if (os->os_zil)
 254  285                  zil_set_sync(os->os_zil, newval);
 255  286  }
 256  287  
 257  288  static void
 258  289  redundant_metadata_changed_cb(void *arg, uint64_t newval)
 259  290  {
 260  291          objset_t *os = arg;
 261  292  
 262  293          /*
 263  294           * Inheritance and range checking should have been done by now.
 264  295           */
 265  296          ASSERT(newval == ZFS_REDUNDANT_METADATA_ALL ||
 266  297              newval == ZFS_REDUNDANT_METADATA_MOST);
 267  298  
 268  299          os->os_redundant_metadata = newval;
 269  300  }
 270  301  
 271  302  static void
 272  303  logbias_changed_cb(void *arg, uint64_t newval)
 273  304  {
 274  305          objset_t *os = arg;
 275  306  
 276  307          ASSERT(newval == ZFS_LOGBIAS_LATENCY ||
 277  308              newval == ZFS_LOGBIAS_THROUGHPUT);
 278  309          os->os_logbias = newval;
 279  310          if (os->os_zil)
 280  311                  zil_set_logbias(os->os_zil, newval);
 281  312  }
 282  313  
 283  314  static void
 284  315  recordsize_changed_cb(void *arg, uint64_t newval)
 285  316  {
 286  317          objset_t *os = arg;
 287  318  
 288  319          os->os_recordsize = newval;
 289  320  }
 290  321  
 291  322  void
 292  323  dmu_objset_byteswap(void *buf, size_t size)
 293  324  {
 294  325          objset_phys_t *osp = buf;
 295  326  
 296  327          ASSERT(size == OBJSET_OLD_PHYS_SIZE || size == sizeof (objset_phys_t));
 297  328          dnode_byteswap(&osp->os_meta_dnode);
 298  329          byteswap_uint64_array(&osp->os_zil_header, sizeof (zil_header_t));
 299  330          osp->os_type = BSWAP_64(osp->os_type);
 300  331          osp->os_flags = BSWAP_64(osp->os_flags);
 301  332          if (size == sizeof (objset_phys_t)) {
 302  333                  dnode_byteswap(&osp->os_userused_dnode);
 303  334                  dnode_byteswap(&osp->os_groupused_dnode);
 304  335          }
 305  336  }
 306  337  
 307  338  /*
 308  339   * The hash is a CRC-based hash of the objset_t pointer and the object number.
 309  340   */
 310  341  static uint64_t
 311  342  dnode_hash(const objset_t *os, uint64_t obj)
 312  343  {
 313  344          uintptr_t osv = (uintptr_t)os;
 314  345          uint64_t crc = -1ULL;
 315  346  
 316  347          ASSERT(zfs_crc64_table[128] == ZFS_CRC64_POLY);
 317  348          /*
 318  349           * The low 6 bits of the pointer don't have much entropy, because
 319  350           * the objset_t is larger than 2^6 bytes long.
 320  351           */
 321  352          crc = (crc >> 8) ^ zfs_crc64_table[(crc ^ (osv >> 6)) & 0xFF];
 322  353          crc = (crc >> 8) ^ zfs_crc64_table[(crc ^ (obj >> 0)) & 0xFF];
 323  354          crc = (crc >> 8) ^ zfs_crc64_table[(crc ^ (obj >> 8)) & 0xFF];
 324  355          crc = (crc >> 8) ^ zfs_crc64_table[(crc ^ (obj >> 16)) & 0xFF];
 325  356  
 326  357          crc ^= (osv>>14) ^ (obj>>24);
 327  358  
 328  359          return (crc);
 329  360  }
 330  361  
 331  362  unsigned int
 332  363  dnode_multilist_index_func(multilist_t *ml, void *obj)
 333  364  {
 334  365          dnode_t *dn = obj;
 335  366          return (dnode_hash(dn->dn_objset, dn->dn_object) %
 336  367              multilist_get_num_sublists(ml));
 337  368  }
 338  369  
 339  370  /*
 340  371   * Instantiates the objset_t in-memory structure corresponding to the
 341  372   * objset_phys_t that's pointed to by the specified blkptr_t.

↓ open down ↓

90 lines elided

↑ open up ↑

 342  373   */
 343  374  int
 344  375  dmu_objset_open_impl(spa_t *spa, dsl_dataset_t *ds, blkptr_t *bp,
 345  376      objset_t **osp)
 346  377  {
 347  378          objset_t *os;
 348  379          int i, err;
 349  380  
 350  381          ASSERT(ds == NULL || MUTEX_HELD(&ds->ds_opening_lock));
 351  382  
 352      -        /*
 353      -         * The $ORIGIN dataset (if it exists) doesn't have an associated
 354      -         * objset, so there's no reason to open it. The $ORIGIN dataset
 355      -         * will not exist on pools older than SPA_VERSION_ORIGIN.
 356      -         */
 357      -        if (ds != NULL && spa_get_dsl(spa) != NULL &&
 358      -            spa_get_dsl(spa)->dp_origin_snap != NULL) {
 359      -                ASSERT3P(ds->ds_dir, !=,
 360      -                    spa_get_dsl(spa)->dp_origin_snap->ds_dir);
 361      -        }
 362      -
 363  383          os = kmem_zalloc(sizeof (objset_t), KM_SLEEP);
 364  384          os->os_dsl_dataset = ds;
 365  385          os->os_spa = spa;
 366  386          os->os_rootbp = bp;
 367  387          if (!BP_IS_HOLE(os->os_rootbp)) {
 368  388                  arc_flags_t aflags = ARC_FLAG_WAIT;
 369  389                  zbookmark_phys_t zb;
 370  390                  SET_BOOKMARK(&zb, ds ? ds->ds_object : DMU_META_OBJSET,
 371  391                      ZB_ROOT_OBJECT, ZB_ROOT_LEVEL, ZB_ROOT_BLKID);
 372  392

 373  393                  if (DMU_OS_IS_L2CACHEABLE(os))
 374  394                          aflags |= ARC_FLAG_L2CACHE;
 375  395  
 376  396                  dprintf_bp(os->os_rootbp, "reading %s", "");
 377  397                  err = arc_read(NULL, spa, os->os_rootbp,
 378  398                      arc_getbuf_func, &os->os_phys_buf,
 379  399                      ZIO_PRIORITY_SYNC_READ, ZIO_FLAG_CANFAIL, &aflags, &zb);
 380  400                  if (err != 0) {
 381  401                          kmem_free(os, sizeof (objset_t));
 382  402                          /* convert checksum errors into IO errors */
 383  403                          if (err == ECKSUM)
 384  404                                  err = SET_ERROR(EIO);
 385  405                          return (err);
 386  406                  }
 387  407  
 388  408                  /* Increase the blocksize if we are permitted. */
 389  409                  if (spa_version(spa) >= SPA_VERSION_USERSPACE &&
 390  410                      arc_buf_size(os->os_phys_buf) < sizeof (objset_phys_t)) {
 391  411                          arc_buf_t *buf = arc_alloc_buf(spa, &os->os_phys_buf,
 392  412                              ARC_BUFC_METADATA, sizeof (objset_phys_t));
 393  413                          bzero(buf->b_data, sizeof (objset_phys_t));
 394  414                          bcopy(os->os_phys_buf->b_data, buf->b_data,
 395  415                              arc_buf_size(os->os_phys_buf));
 396  416                          arc_buf_destroy(os->os_phys_buf, &os->os_phys_buf);
 397  417                          os->os_phys_buf = buf;
 398  418                  }
 399  419  
 400  420                  os->os_phys = os->os_phys_buf->b_data;
 401  421                  os->os_flags = os->os_phys->os_flags;
 402  422          } else {
 403  423                  int size = spa_version(spa) >= SPA_VERSION_USERSPACE ?
 404  424                      sizeof (objset_phys_t) : OBJSET_OLD_PHYS_SIZE;
 405  425                  os->os_phys_buf = arc_alloc_buf(spa, &os->os_phys_buf,
 406  426                      ARC_BUFC_METADATA, size);
 407  427                  os->os_phys = os->os_phys_buf->b_data;
 408  428                  bzero(os->os_phys, size);
 409  429          }
 410  430  
 411  431          /*
 412  432           * Note: the changed_cb will be called once before the register
 413  433           * func returns, thus changing the checksum/compression from the
 414  434           * default (fletcher2/off).  Snapshots don't need to know about
 415  435           * checksum/compression/copies.
 416  436           */
 417  437          if (ds != NULL) {
 418  438                  boolean_t needlock = B_FALSE;
 419  439  
 420  440                  /*
 421  441                   * Note: it's valid to open the objset if the dataset is
 422  442                   * long-held, in which case the pool_config lock will not
 423  443                   * be held.
 424  444                   */
 425  445                  if (!dsl_pool_config_held(dmu_objset_pool(os))) {
 426  446                          needlock = B_TRUE;

↓ open down ↓

54 lines elided

↑ open up ↑

 427  447                          dsl_pool_config_enter(dmu_objset_pool(os), FTAG);
 428  448                  }
 429  449                  err = dsl_prop_register(ds,
 430  450                      zfs_prop_to_name(ZFS_PROP_PRIMARYCACHE),
 431  451                      primary_cache_changed_cb, os);
 432  452                  if (err == 0) {
 433  453                          err = dsl_prop_register(ds,
 434  454                              zfs_prop_to_name(ZFS_PROP_SECONDARYCACHE),
 435  455                              secondary_cache_changed_cb, os);
 436  456                  }
      457 +                if (err == 0) {
      458 +                        err = dsl_prop_register(ds,
      459 +                            zfs_prop_to_name(ZFS_PROP_ZPL_META_TO_METADEV),
      460 +                            zpl_meta_placement_changed_cb, os);
      461 +                }
 437  462                  if (!ds->ds_is_snapshot) {
 438  463                          if (err == 0) {
 439  464                                  err = dsl_prop_register(ds,
 440  465                                      zfs_prop_to_name(ZFS_PROP_CHECKSUM),
 441  466                                      checksum_changed_cb, os);
 442  467                          }
 443  468                          if (err == 0) {
 444  469                                  err = dsl_prop_register(ds,
 445  470                                      zfs_prop_to_name(ZFS_PROP_COMPRESSION),
 446  471                                      compression_changed_cb, os);
 447  472                          }
 448  473                          if (err == 0) {
 449  474                                  err = dsl_prop_register(ds,
      475 +                                    zfs_prop_to_name(ZFS_PROP_SMARTCOMPRESSION),
      476 +                                    smartcomp_changed_cb, os);
      477 +                        }
      478 +                        if (err == 0) {
      479 +                                err = dsl_prop_register(ds,
 450  480                                      zfs_prop_to_name(ZFS_PROP_COPIES),
 451  481                                      copies_changed_cb, os);
 452  482                          }
 453  483                          if (err == 0) {
 454  484                                  err = dsl_prop_register(ds,
 455  485                                      zfs_prop_to_name(ZFS_PROP_DEDUP),
 456  486                                      dedup_changed_cb, os);
 457  487                          }
 458  488                          if (err == 0) {
 459  489                                  err = dsl_prop_register(ds,

 460  490                                      zfs_prop_to_name(ZFS_PROP_LOGBIAS),
 461  491                                      logbias_changed_cb, os);
 462  492                          }
 463  493                          if (err == 0) {
 464  494                                  err = dsl_prop_register(ds,
 465  495                                      zfs_prop_to_name(ZFS_PROP_SYNC),
 466  496                                      sync_changed_cb, os);
 467  497                          }
 468  498                          if (err == 0) {

↓ open down ↓

9 lines elided

↑ open up ↑

 469  499                                  err = dsl_prop_register(ds,
 470  500                                      zfs_prop_to_name(
 471  501                                      ZFS_PROP_REDUNDANT_METADATA),
 472  502                                      redundant_metadata_changed_cb, os);
 473  503                          }
 474  504                          if (err == 0) {
 475  505                                  err = dsl_prop_register(ds,
 476  506                                      zfs_prop_to_name(ZFS_PROP_RECORDSIZE),
 477  507                                      recordsize_changed_cb, os);
 478  508                          }
      509 +                        if (err == 0) {
      510 +                                err = dsl_prop_register(ds,
      511 +                                    zfs_prop_to_name(ZFS_PROP_WBC_MODE),
      512 +                                    wbc_mode_changed, os);
      513 +                        }
 479  514                  }
 480  515                  if (needlock)
 481  516                          dsl_pool_config_exit(dmu_objset_pool(os), FTAG);
 482  517                  if (err != 0) {
 483  518                          arc_buf_destroy(os->os_phys_buf, &os->os_phys_buf);
 484  519                          kmem_free(os, sizeof (objset_t));
 485  520                          return (err);
 486  521                  }
 487  522          } else {
 488  523                  /* It's the meta-objset. */
 489  524                  os->os_checksum = ZIO_CHECKSUM_FLETCHER_4;
 490  525                  os->os_compress = ZIO_COMPRESS_ON;
 491  526                  os->os_copies = spa_max_replication(spa);
 492  527                  os->os_dedup_checksum = ZIO_CHECKSUM_OFF;
 493  528                  os->os_dedup_verify = B_FALSE;
 494  529                  os->os_logbias = ZFS_LOGBIAS_LATENCY;
 495  530                  os->os_sync = ZFS_SYNC_STANDARD;
 496  531                  os->os_primary_cache = ZFS_CACHE_ALL;
 497  532                  os->os_secondary_cache = ZFS_CACHE_ALL;
      533 +                os->os_zpl_meta_to_special = 0;
 498  534          }
      535 +        /*
      536 +         * These properties will be filled in by the logic in zfs_get_zplprop()
      537 +         * when they are queried for the first time.
      538 +         */
      539 +        os->os_version = OBJSET_PROP_UNINITIALIZED;
      540 +        os->os_normalization = OBJSET_PROP_UNINITIALIZED;
      541 +        os->os_utf8only = OBJSET_PROP_UNINITIALIZED;
      542 +        os->os_casesensitivity = OBJSET_PROP_UNINITIALIZED;
 499  543  
 500  544          if (ds == NULL || !ds->ds_is_snapshot)
 501  545                  os->os_zil_header = os->os_phys->os_zil_header;
 502  546          os->os_zil = zil_alloc(os, &os->os_zil_header);
 503  547  
 504  548          for (i = 0; i < TXG_SIZE; i++) {
 505  549                  os->os_dirty_dnodes[i] = multilist_create(sizeof (dnode_t),
 506  550                      offsetof(dnode_t, dn_dirty_link[i]),
 507  551                      dnode_multilist_index_func);
 508  552          }

 509  553          list_create(&os->os_dnodes, sizeof (dnode_t),
 510  554              offsetof(dnode_t, dn_link));
 511  555          list_create(&os->os_downgraded_dbufs, sizeof (dmu_buf_impl_t),
 512  556              offsetof(dmu_buf_impl_t, db_link));
 513  557  
 514  558          mutex_init(&os->os_lock, NULL, MUTEX_DEFAULT, NULL);
 515  559          mutex_init(&os->os_userused_lock, NULL, MUTEX_DEFAULT, NULL);
 516  560          mutex_init(&os->os_obj_lock, NULL, MUTEX_DEFAULT, NULL);
 517  561          mutex_init(&os->os_user_ptr_lock, NULL, MUTEX_DEFAULT, NULL);
 518  562  
 519  563          dnode_special_open(os, &os->os_phys->os_meta_dnode,
 520  564              DMU_META_DNODE_OBJECT, &os->os_meta_dnode);
 521  565          if (arc_buf_size(os->os_phys_buf) >= sizeof (objset_phys_t)) {
 522  566                  dnode_special_open(os, &os->os_phys->os_userused_dnode,
 523  567                      DMU_USERUSED_OBJECT, &os->os_userused_dnode);
 524  568                  dnode_special_open(os, &os->os_phys->os_groupused_dnode,
 525  569                      DMU_GROUPUSED_OBJECT, &os->os_groupused_dnode);
 526  570          }
 527  571  
 528  572          *osp = os;
 529  573          return (0);
 530  574  }
 531  575  
 532  576  int
 533  577  dmu_objset_from_ds(dsl_dataset_t *ds, objset_t **osp)
 534  578  {
 535  579          int err = 0;
 536  580  
 537  581          /*
 538  582           * We shouldn't be doing anything with dsl_dataset_t's unless the
 539  583           * pool_config lock is held, or the dataset is long-held.
 540  584           */
 541  585          ASSERT(dsl_pool_config_held(ds->ds_dir->dd_pool) ||
 542  586              dsl_dataset_long_held(ds));
 543  587  
 544  588          mutex_enter(&ds->ds_opening_lock);
 545  589          if (ds->ds_objset == NULL) {
 546  590                  objset_t *os;
 547  591                  rrw_enter(&ds->ds_bp_rwlock, RW_READER, FTAG);
 548  592                  err = dmu_objset_open_impl(dsl_dataset_get_spa(ds),
 549  593                      ds, dsl_dataset_get_blkptr(ds), &os);
 550  594                  rrw_exit(&ds->ds_bp_rwlock, FTAG);
 551  595  
 552  596                  if (err == 0) {
 553  597                          mutex_enter(&ds->ds_lock);
 554  598                          ASSERT(ds->ds_objset == NULL);
 555  599                          ds->ds_objset = os;
 556  600                          mutex_exit(&ds->ds_lock);
 557  601                  }
 558  602          }
 559  603          *osp = ds->ds_objset;
 560  604          mutex_exit(&ds->ds_opening_lock);
 561  605          return (err);
 562  606  }
 563  607  
 564  608  /*
 565  609   * Holds the pool while the objset is held.  Therefore only one objset
 566  610   * can be held at a time.
 567  611   */
 568  612  int
 569  613  dmu_objset_hold(const char *name, void *tag, objset_t **osp)
 570  614  {
 571  615          dsl_pool_t *dp;
 572  616          dsl_dataset_t *ds;
 573  617          int err;
 574  618  
 575  619          err = dsl_pool_hold(name, tag, &dp);
 576  620          if (err != 0)
 577  621                  return (err);
 578  622          err = dsl_dataset_hold(dp, name, tag, &ds);
 579  623          if (err != 0) {
 580  624                  dsl_pool_rele(dp, tag);
 581  625                  return (err);
 582  626          }
 583  627  
 584  628          err = dmu_objset_from_ds(ds, osp);
 585  629          if (err != 0) {
 586  630                  dsl_dataset_rele(ds, tag);
 587  631                  dsl_pool_rele(dp, tag);
 588  632          }
 589  633  
 590  634          return (err);
 591  635  }
 592  636  
 593  637  static int
 594  638  dmu_objset_own_impl(dsl_dataset_t *ds, dmu_objset_type_t type,
 595  639      boolean_t readonly, void *tag, objset_t **osp)
 596  640  {
 597  641          int err;
 598  642  
 599  643          err = dmu_objset_from_ds(ds, osp);
 600  644          if (err != 0) {
 601  645                  dsl_dataset_disown(ds, tag);
 602  646          } else if (type != DMU_OST_ANY && type != (*osp)->os_phys->os_type) {
 603  647                  dsl_dataset_disown(ds, tag);
 604  648                  return (SET_ERROR(EINVAL));
 605  649          } else if (!readonly && dsl_dataset_is_snapshot(ds)) {
 606  650                  dsl_dataset_disown(ds, tag);
 607  651                  return (SET_ERROR(EROFS));
 608  652          }
 609  653          return (err);
 610  654  }
 611  655  
 612  656  /*
 613  657   * dsl_pool must not be held when this is called.
 614  658   * Upon successful return, there will be a longhold on the dataset,
 615  659   * and the dsl_pool will not be held.
 616  660   */
 617  661  int
 618  662  dmu_objset_own(const char *name, dmu_objset_type_t type,
 619  663      boolean_t readonly, void *tag, objset_t **osp)
 620  664  {
 621  665          dsl_pool_t *dp;
 622  666          dsl_dataset_t *ds;
 623  667          int err;
 624  668  
 625  669          err = dsl_pool_hold(name, FTAG, &dp);
 626  670          if (err != 0)
 627  671                  return (err);
 628  672          err = dsl_dataset_own(dp, name, tag, &ds);
 629  673          if (err != 0) {
 630  674                  dsl_pool_rele(dp, FTAG);
 631  675                  return (err);
 632  676          }
 633  677          err = dmu_objset_own_impl(ds, type, readonly, tag, osp);
 634  678          dsl_pool_rele(dp, FTAG);
 635  679  
 636  680          return (err);
 637  681  }
 638  682  
 639  683  int
 640  684  dmu_objset_own_obj(dsl_pool_t *dp, uint64_t obj, dmu_objset_type_t type,
 641  685      boolean_t readonly, void *tag, objset_t **osp)
 642  686  {
 643  687          dsl_dataset_t *ds;
 644  688          int err;
 645  689  
 646  690          err = dsl_dataset_own_obj(dp, obj, tag, &ds);
 647  691          if (err != 0)
 648  692                  return (err);
 649  693  
 650  694          return (dmu_objset_own_impl(ds, type, readonly, tag, osp));
 651  695  }
 652  696  
 653  697  void
 654  698  dmu_objset_rele(objset_t *os, void *tag)
 655  699  {
 656  700          dsl_pool_t *dp = dmu_objset_pool(os);
 657  701          dsl_dataset_rele(os->os_dsl_dataset, tag);
 658  702          dsl_pool_rele(dp, tag);
 659  703  }
 660  704  
 661  705  /*
 662  706   * When we are called, os MUST refer to an objset associated with a dataset
 663  707   * that is owned by 'tag'; that is, is held and long held by 'tag' and ds_owner
 664  708   * == tag.  We will then release and reacquire ownership of the dataset while
 665  709   * holding the pool config_rwlock to avoid intervening namespace or ownership
 666  710   * changes may occur.
 667  711   *
 668  712   * This exists solely to accommodate zfs_ioc_userspace_upgrade()'s desire to
 669  713   * release the hold on its dataset and acquire a new one on the dataset of the
 670  714   * same name so that it can be partially torn down and reconstructed.
 671  715   */
 672  716  void
 673  717  dmu_objset_refresh_ownership(objset_t *os, void *tag)
 674  718  {
 675  719          dsl_pool_t *dp;
 676  720          dsl_dataset_t *ds, *newds;
 677  721          char name[ZFS_MAX_DATASET_NAME_LEN];
 678  722  
 679  723          ds = os->os_dsl_dataset;
 680  724          VERIFY3P(ds, !=, NULL);
 681  725          VERIFY3P(ds->ds_owner, ==, tag);
 682  726          VERIFY(dsl_dataset_long_held(ds));
 683  727  
 684  728          dsl_dataset_name(ds, name);
 685  729          dp = dmu_objset_pool(os);
 686  730          dsl_pool_config_enter(dp, FTAG);
 687  731          dmu_objset_disown(os, tag);
 688  732          VERIFY0(dsl_dataset_own(dp, name, tag, &newds));
 689  733          VERIFY3P(newds, ==, os->os_dsl_dataset);
 690  734          dsl_pool_config_exit(dp, FTAG);
 691  735  }
 692  736  
 693  737  void
 694  738  dmu_objset_disown(objset_t *os, void *tag)
 695  739  {
 696  740          dsl_dataset_disown(os->os_dsl_dataset, tag);
 697  741  }
 698  742  
 699  743  void
 700  744  dmu_objset_evict_dbufs(objset_t *os)
 701  745  {
 702  746          dnode_t dn_marker;
 703  747          dnode_t *dn;
 704  748  
 705  749          mutex_enter(&os->os_lock);
 706  750          dn = list_head(&os->os_dnodes);

↓ open down ↓

198 lines elided

↑ open up ↑

 707  751          while (dn != NULL) {
 708  752                  /*
 709  753                   * Skip dnodes without holds.  We have to do this dance
 710  754                   * because dnode_add_ref() only works if there is already a
 711  755                   * hold.  If the dnode has no holds, then it has no dbufs.
 712  756                   */
 713  757                  if (dnode_add_ref(dn, FTAG)) {
 714  758                          list_insert_after(&os->os_dnodes, dn, &dn_marker);
 715  759                          mutex_exit(&os->os_lock);
 716  760  
 717      -                        dnode_evict_dbufs(dn);
      761 +                        dnode_evict_dbufs(dn, DBUF_EVICT_ALL);
 718  762                          dnode_rele(dn, FTAG);
 719  763  
 720  764                          mutex_enter(&os->os_lock);
 721  765                          dn = list_next(&os->os_dnodes, &dn_marker);
 722  766                          list_remove(&os->os_dnodes, &dn_marker);
 723  767                  } else {
 724  768                          dn = list_next(&os->os_dnodes, dn);
 725  769                  }
 726  770          }
 727  771          mutex_exit(&os->os_lock);
 728  772  
 729  773          if (DMU_USERUSED_DNODE(os) != NULL) {
 730      -                dnode_evict_dbufs(DMU_GROUPUSED_DNODE(os));
 731      -                dnode_evict_dbufs(DMU_USERUSED_DNODE(os));
      774 +                dnode_evict_dbufs(DMU_GROUPUSED_DNODE(os), DBUF_EVICT_ALL);
      775 +                dnode_evict_dbufs(DMU_USERUSED_DNODE(os), DBUF_EVICT_ALL);
 732  776          }
 733      -        dnode_evict_dbufs(DMU_META_DNODE(os));
      777 +        dnode_evict_dbufs(DMU_META_DNODE(os), DBUF_EVICT_ALL);
 734  778  }
 735  779  
 736  780  /*
 737  781   * Objset eviction processing is split into into two pieces.
 738  782   * The first marks the objset as evicting, evicts any dbufs that
 739  783   * have a refcount of zero, and then queues up the objset for the
 740  784   * second phase of eviction.  Once os->os_dnodes has been cleared by
 741  785   * dnode_buf_pageout()->dnode_destroy(), the second phase is executed.
 742  786   * The second phase closes the special dnodes, dequeues the objset from
 743  787   * the list of those undergoing eviction, and finally frees the objset.

 744  788   *
 745  789   * NOTE: Due to asynchronous eviction processing (invocation of
 746  790   *       dnode_buf_pageout()), it is possible for the meta dnode for the
 747  791   *       objset to have no holds even though os->os_dnodes is not empty.
 748  792   */
 749  793  void
 750  794  dmu_objset_evict(objset_t *os)
 751  795  {
 752  796          dsl_dataset_t *ds = os->os_dsl_dataset;
 753  797  
 754  798          for (int t = 0; t < TXG_SIZE; t++)
 755  799                  ASSERT(!dmu_objset_is_dirty(os, t));
 756  800  
 757  801          if (ds)
 758  802                  dsl_prop_unregister_all(ds, os);
 759  803  
 760  804          if (os->os_sa)
 761  805                  sa_tear_down(os);
 762  806  
 763  807          dmu_objset_evict_dbufs(os);
 764  808  
 765  809          mutex_enter(&os->os_lock);
 766  810          spa_evicting_os_register(os->os_spa, os);
 767  811          if (list_is_empty(&os->os_dnodes)) {
 768  812                  mutex_exit(&os->os_lock);
 769  813                  dmu_objset_evict_done(os);
 770  814          } else {
 771  815                  mutex_exit(&os->os_lock);
 772  816          }
 773  817  }
 774  818  
 775  819  void
 776  820  dmu_objset_evict_done(objset_t *os)
 777  821  {
 778  822          ASSERT3P(list_head(&os->os_dnodes), ==, NULL);
 779  823  
 780  824          dnode_special_close(&os->os_meta_dnode);
 781  825          if (DMU_USERUSED_DNODE(os)) {
 782  826                  dnode_special_close(&os->os_userused_dnode);
 783  827                  dnode_special_close(&os->os_groupused_dnode);
 784  828          }
 785  829          zil_free(os->os_zil);
 786  830  
 787  831          arc_buf_destroy(os->os_phys_buf, &os->os_phys_buf);
 788  832  
 789  833          /*
 790  834           * This is a barrier to prevent the objset from going away in
 791  835           * dnode_move() until we can safely ensure that the objset is still in
 792  836           * use. We consider the objset valid before the barrier and invalid
 793  837           * after the barrier.
 794  838           */
 795  839          rw_enter(&os_lock, RW_READER);
 796  840          rw_exit(&os_lock);
 797  841  
 798  842          mutex_destroy(&os->os_lock);
 799  843          mutex_destroy(&os->os_userused_lock);
 800  844          mutex_destroy(&os->os_obj_lock);
 801  845          mutex_destroy(&os->os_user_ptr_lock);
 802  846          for (int i = 0; i < TXG_SIZE; i++) {
 803  847                  multilist_destroy(os->os_dirty_dnodes[i]);
 804  848          }
 805  849          spa_evicting_os_deregister(os->os_spa, os);
 806  850          kmem_free(os, sizeof (objset_t));
 807  851  }
 808  852  
 809  853  timestruc_t
 810  854  dmu_objset_snap_cmtime(objset_t *os)
 811  855  {
 812  856          return (dsl_dir_snap_cmtime(os->os_dsl_dataset->ds_dir));
 813  857  }
 814  858  
 815  859  /* called from dsl for meta-objset */
 816  860  objset_t *
 817  861  dmu_objset_create_impl(spa_t *spa, dsl_dataset_t *ds, blkptr_t *bp,
 818  862      dmu_objset_type_t type, dmu_tx_t *tx)
 819  863  {
 820  864          objset_t *os;
 821  865          dnode_t *mdn;
 822  866  
 823  867          ASSERT(dmu_tx_is_syncing(tx));
 824  868  
 825  869          if (ds != NULL)
 826  870                  VERIFY0(dmu_objset_from_ds(ds, &os));
 827  871          else
 828  872                  VERIFY0(dmu_objset_open_impl(spa, NULL, bp, &os));
 829  873  
 830  874          mdn = DMU_META_DNODE(os);
 831  875  
 832  876          dnode_allocate(mdn, DMU_OT_DNODE, 1 << DNODE_BLOCK_SHIFT,
 833  877              DN_MAX_INDBLKSHIFT, DMU_OT_NONE, 0, tx);
 834  878  
 835  879          /*
 836  880           * We don't want to have to increase the meta-dnode's nlevels
 837  881           * later, because then we could do it in quescing context while
 838  882           * we are also accessing it in open context.
 839  883           *
 840  884           * This precaution is not necessary for the MOS (ds == NULL),
 841  885           * because the MOS is only updated in syncing context.
 842  886           * This is most fortunate: the MOS is the only objset that
 843  887           * needs to be synced multiple times as spa_sync() iterates
 844  888           * to convergence, so minimizing its dn_nlevels matters.
 845  889           */
 846  890          if (ds != NULL) {
 847  891                  int levels = 1;
 848  892  
 849  893                  /*
 850  894                   * Determine the number of levels necessary for the meta-dnode
 851  895                   * to contain DN_MAX_OBJECT dnodes.  Note that in order to
 852  896                   * ensure that we do not overflow 64 bits, there has to be
 853  897                   * a nlevels that gives us a number of blocks > DN_MAX_OBJECT
 854  898                   * but < 2^64.  Therefore,
 855  899                   * (mdn->dn_indblkshift - SPA_BLKPTRSHIFT) (10) must be
 856  900                   * less than (64 - log2(DN_MAX_OBJECT)) (16).
 857  901                   */
 858  902                  while ((uint64_t)mdn->dn_nblkptr <<
 859  903                      (mdn->dn_datablkshift - DNODE_SHIFT +
 860  904                      (levels - 1) * (mdn->dn_indblkshift - SPA_BLKPTRSHIFT)) <
 861  905                      DN_MAX_OBJECT)
 862  906                          levels++;
 863  907  
 864  908                  mdn->dn_next_nlevels[tx->tx_txg & TXG_MASK] =
 865  909                      mdn->dn_nlevels = levels;
 866  910          }
 867  911  
 868  912          ASSERT(type != DMU_OST_NONE);
 869  913          ASSERT(type != DMU_OST_ANY);
 870  914          ASSERT(type < DMU_OST_NUMTYPES);
 871  915          os->os_phys->os_type = type;
 872  916          if (dmu_objset_userused_enabled(os)) {
 873  917                  os->os_phys->os_flags |= OBJSET_FLAG_USERACCOUNTING_COMPLETE;
 874  918                  os->os_flags = os->os_phys->os_flags;
 875  919          }
 876  920  
 877  921          dsl_dataset_dirty(ds, tx);
 878  922  
 879  923          return (os);
 880  924  }
 881  925  
 882  926  typedef struct dmu_objset_create_arg {
 883  927          const char *doca_name;
 884  928          cred_t *doca_cred;
 885  929          void (*doca_userfunc)(objset_t *os, void *arg,
 886  930              cred_t *cr, dmu_tx_t *tx);
 887  931          void *doca_userarg;
 888  932          dmu_objset_type_t doca_type;
 889  933          uint64_t doca_flags;
 890  934  } dmu_objset_create_arg_t;
 891  935  
 892  936  /*ARGSUSED*/
 893  937  static int
 894  938  dmu_objset_create_check(void *arg, dmu_tx_t *tx)
 895  939  {
 896  940          dmu_objset_create_arg_t *doca = arg;
 897  941          dsl_pool_t *dp = dmu_tx_pool(tx);
 898  942          dsl_dir_t *pdd;
 899  943          const char *tail;
 900  944          int error;
 901  945  
 902  946          if (strchr(doca->doca_name, '@') != NULL)
 903  947                  return (SET_ERROR(EINVAL));
 904  948  
 905  949          if (strlen(doca->doca_name) >= ZFS_MAX_DATASET_NAME_LEN)
 906  950                  return (SET_ERROR(ENAMETOOLONG));
 907  951  
 908  952          error = dsl_dir_hold(dp, doca->doca_name, FTAG, &pdd, &tail);
 909  953          if (error != 0)
 910  954                  return (error);
 911  955          if (tail == NULL) {
 912  956                  dsl_dir_rele(pdd, FTAG);
 913  957                  return (SET_ERROR(EEXIST));
 914  958          }
 915  959          error = dsl_fs_ss_limit_check(pdd, 1, ZFS_PROP_FILESYSTEM_LIMIT, NULL,
 916  960              doca->doca_cred);
 917  961          dsl_dir_rele(pdd, FTAG);
 918  962  
 919  963          return (error);
 920  964  }
 921  965  
 922  966  static void
 923  967  dmu_objset_create_sync(void *arg, dmu_tx_t *tx)
 924  968  {
 925  969          dmu_objset_create_arg_t *doca = arg;
 926  970          dsl_pool_t *dp = dmu_tx_pool(tx);
 927  971          dsl_dir_t *pdd;
 928  972          const char *tail;
 929  973          dsl_dataset_t *ds;
 930  974          uint64_t obj;
 931  975          blkptr_t *bp;
 932  976          objset_t *os;
 933  977  
 934  978          VERIFY0(dsl_dir_hold(dp, doca->doca_name, FTAG, &pdd, &tail));
 935  979  
 936  980          obj = dsl_dataset_create_sync(pdd, tail, NULL, doca->doca_flags,
 937  981              doca->doca_cred, tx);
 938  982  
 939  983          VERIFY0(dsl_dataset_hold_obj(pdd->dd_pool, obj, FTAG, &ds));
 940  984          rrw_enter(&ds->ds_bp_rwlock, RW_READER, FTAG);
 941  985          bp = dsl_dataset_get_blkptr(ds);
 942  986          os = dmu_objset_create_impl(pdd->dd_pool->dp_spa,
 943  987              ds, bp, doca->doca_type, tx);
 944  988          rrw_exit(&ds->ds_bp_rwlock, FTAG);
 945  989  
 946  990          if (doca->doca_userfunc != NULL) {
 947  991                  doca->doca_userfunc(os, doca->doca_userarg,
 948  992                      doca->doca_cred, tx);
 949  993          }
 950  994  
 951  995          spa_history_log_internal_ds(ds, "create", tx, "");
 952  996          dsl_dataset_rele(ds, FTAG);
 953  997          dsl_dir_rele(pdd, FTAG);
 954  998  }
 955  999  
 956 1000  int
 957 1001  dmu_objset_create(const char *name, dmu_objset_type_t type, uint64_t flags,
 958 1002      void (*func)(objset_t *os, void *arg, cred_t *cr, dmu_tx_t *tx), void *arg)
 959 1003  {
 960 1004          dmu_objset_create_arg_t doca;
 961 1005  
 962 1006          doca.doca_name = name;
 963 1007          doca.doca_cred = CRED();
 964 1008          doca.doca_flags = flags;
 965 1009          doca.doca_userfunc = func;
 966 1010          doca.doca_userarg = arg;
 967 1011          doca.doca_type = type;
 968 1012  
 969 1013          return (dsl_sync_task(name,
 970 1014              dmu_objset_create_check, dmu_objset_create_sync, &doca,
 971 1015              5, ZFS_SPACE_CHECK_NORMAL));
 972 1016  }
 973 1017  
 974 1018  typedef struct dmu_objset_clone_arg {
 975 1019          const char *doca_clone;
 976 1020          const char *doca_origin;
 977 1021          cred_t *doca_cred;
 978 1022  } dmu_objset_clone_arg_t;
 979 1023  
 980 1024  /*ARGSUSED*/
 981 1025  static int
 982 1026  dmu_objset_clone_check(void *arg, dmu_tx_t *tx)
 983 1027  {
 984 1028          dmu_objset_clone_arg_t *doca = arg;
 985 1029          dsl_dir_t *pdd;
 986 1030          const char *tail;
 987 1031          int error;
 988 1032          dsl_dataset_t *origin;
 989 1033          dsl_pool_t *dp = dmu_tx_pool(tx);
 990 1034  
 991 1035          if (strchr(doca->doca_clone, '@') != NULL)
 992 1036                  return (SET_ERROR(EINVAL));
 993 1037  
 994 1038          if (strlen(doca->doca_clone) >= ZFS_MAX_DATASET_NAME_LEN)
 995 1039                  return (SET_ERROR(ENAMETOOLONG));
 996 1040  
 997 1041          error = dsl_dir_hold(dp, doca->doca_clone, FTAG, &pdd, &tail);
 998 1042          if (error != 0)
 999 1043                  return (error);
1000 1044          if (tail == NULL) {
1001 1045                  dsl_dir_rele(pdd, FTAG);
1002 1046                  return (SET_ERROR(EEXIST));
1003 1047          }
1004 1048  
1005 1049          error = dsl_fs_ss_limit_check(pdd, 1, ZFS_PROP_FILESYSTEM_LIMIT, NULL,
1006 1050              doca->doca_cred);
1007 1051          if (error != 0) {
1008 1052                  dsl_dir_rele(pdd, FTAG);
1009 1053                  return (SET_ERROR(EDQUOT));
1010 1054          }
1011 1055          dsl_dir_rele(pdd, FTAG);
1012 1056  
1013 1057          error = dsl_dataset_hold(dp, doca->doca_origin, FTAG, &origin);
1014 1058          if (error != 0)
1015 1059                  return (error);
1016 1060  
1017 1061          /* You can only clone snapshots, not the head datasets. */
1018 1062          if (!origin->ds_is_snapshot) {
1019 1063                  dsl_dataset_rele(origin, FTAG);
1020 1064                  return (SET_ERROR(EINVAL));
1021 1065          }
1022 1066          dsl_dataset_rele(origin, FTAG);
1023 1067  
1024 1068          return (0);
1025 1069  }
1026 1070  
1027 1071  static void
1028 1072  dmu_objset_clone_sync(void *arg, dmu_tx_t *tx)
1029 1073  {
1030 1074          dmu_objset_clone_arg_t *doca = arg;
1031 1075          dsl_pool_t *dp = dmu_tx_pool(tx);
1032 1076          dsl_dir_t *pdd;
1033 1077          const char *tail;
1034 1078          dsl_dataset_t *origin, *ds;
1035 1079          uint64_t obj;
1036 1080          char namebuf[ZFS_MAX_DATASET_NAME_LEN];
1037 1081  
1038 1082          VERIFY0(dsl_dir_hold(dp, doca->doca_clone, FTAG, &pdd, &tail));
1039 1083          VERIFY0(dsl_dataset_hold(dp, doca->doca_origin, FTAG, &origin));
1040 1084  
1041 1085          obj = dsl_dataset_create_sync(pdd, tail, origin, 0,
1042 1086              doca->doca_cred, tx);
1043 1087  
1044 1088          VERIFY0(dsl_dataset_hold_obj(pdd->dd_pool, obj, FTAG, &ds));
1045 1089          dsl_dataset_name(origin, namebuf);
1046 1090          spa_history_log_internal_ds(ds, "clone", tx,
1047 1091              "origin=%s (%llu)", namebuf, origin->ds_object);
1048 1092          dsl_dataset_rele(ds, FTAG);
1049 1093          dsl_dataset_rele(origin, FTAG);
1050 1094          dsl_dir_rele(pdd, FTAG);
1051 1095  }
1052 1096  
1053 1097  int
1054 1098  dmu_objset_clone(const char *clone, const char *origin)
1055 1099  {
1056 1100          dmu_objset_clone_arg_t doca;

↓ open down ↓

313 lines elided

↑ open up ↑

1057 1101  
1058 1102          doca.doca_clone = clone;
1059 1103          doca.doca_origin = origin;
1060 1104          doca.doca_cred = CRED();
1061 1105  
1062 1106          return (dsl_sync_task(clone,
1063 1107              dmu_objset_clone_check, dmu_objset_clone_sync, &doca,
1064 1108              5, ZFS_SPACE_CHECK_NORMAL));
1065 1109  }
1066 1110  
1067      -static int
1068      -dmu_objset_remap_indirects_impl(objset_t *os, uint64_t last_removed_txg)
1069      -{
1070      -        int error = 0;
1071      -        uint64_t object = 0;
1072      -        while ((error = dmu_object_next(os, &object, B_FALSE, 0)) == 0) {
1073      -                error = dmu_object_remap_indirects(os, object,
1074      -                    last_removed_txg);
1075      -                /*
1076      -                 * If the ZPL removed the object before we managed to dnode_hold
1077      -                 * it, we would get an ENOENT. If the ZPL declares its intent
1078      -                 * to remove the object (dnode_free) before we manage to
1079      -                 * dnode_hold it, we would get an EEXIST. In either case, we
1080      -                 * want to continue remapping the other objects in the objset;
1081      -                 * in all other cases, we want to break early.
1082      -                 */
1083      -                if (error != 0 && error != ENOENT && error != EEXIST) {
1084      -                        break;
1085      -                }
1086      -        }
1087      -        if (error == ESRCH) {
1088      -                error = 0;
1089      -        }
1090      -        return (error);
1091      -}
1092      -
1093 1111  int
1094      -dmu_objset_remap_indirects(const char *fsname)
1095      -{
1096      -        int error = 0;
1097      -        objset_t *os = NULL;
1098      -        uint64_t last_removed_txg;
1099      -        uint64_t remap_start_txg;
1100      -        dsl_dir_t *dd;
1101      -
1102      -        error = dmu_objset_hold(fsname, FTAG, &os);
1103      -        if (error != 0) {
1104      -                return (error);
1105      -        }
1106      -        dd = dmu_objset_ds(os)->ds_dir;
1107      -
1108      -        if (!spa_feature_is_enabled(dmu_objset_spa(os),
1109      -            SPA_FEATURE_OBSOLETE_COUNTS)) {
1110      -                dmu_objset_rele(os, FTAG);
1111      -                return (SET_ERROR(ENOTSUP));
1112      -        }
1113      -
1114      -        if (dsl_dataset_is_snapshot(dmu_objset_ds(os))) {
1115      -                dmu_objset_rele(os, FTAG);
1116      -                return (SET_ERROR(EINVAL));
1117      -        }
1118      -
1119      -        /*
1120      -         * If there has not been a removal, we're done.
1121      -         */
1122      -        last_removed_txg = spa_get_last_removal_txg(dmu_objset_spa(os));
1123      -        if (last_removed_txg == -1ULL) {
1124      -                dmu_objset_rele(os, FTAG);
1125      -                return (0);
1126      -        }
1127      -
1128      -        /*
1129      -         * If we have remapped since the last removal, we're done.
1130      -         */
1131      -        if (dsl_dir_is_zapified(dd)) {
1132      -                uint64_t last_remap_txg;
1133      -                if (zap_lookup(spa_meta_objset(dmu_objset_spa(os)),
1134      -                    dd->dd_object, DD_FIELD_LAST_REMAP_TXG,
1135      -                    sizeof (last_remap_txg), 1, &last_remap_txg) == 0 &&
1136      -                    last_remap_txg > last_removed_txg) {
1137      -                        dmu_objset_rele(os, FTAG);
1138      -                        return (0);
1139      -                }
1140      -        }
1141      -
1142      -        dsl_dataset_long_hold(dmu_objset_ds(os), FTAG);
1143      -        dsl_pool_rele(dmu_objset_pool(os), FTAG);
1144      -
1145      -        remap_start_txg = spa_last_synced_txg(dmu_objset_spa(os));
1146      -        error = dmu_objset_remap_indirects_impl(os, last_removed_txg);
1147      -        if (error == 0) {
1148      -                /*
1149      -                 * We update the last_remap_txg to be the start txg so that
1150      -                 * we can guarantee that every block older than last_remap_txg
1151      -                 * that can be remapped has been remapped.
1152      -                 */
1153      -                error = dsl_dir_update_last_remap_txg(dd, remap_start_txg);
1154      -        }
1155      -
1156      -        dsl_dataset_long_rele(dmu_objset_ds(os), FTAG);
1157      -        dsl_dataset_rele(dmu_objset_ds(os), FTAG);
1158      -
1159      -        return (error);
1160      -}
1161      -
1162      -int
1163 1112  dmu_objset_snapshot_one(const char *fsname, const char *snapname)
1164 1113  {
1165 1114          int err;
1166 1115          char *longsnap = kmem_asprintf("%s@%s", fsname, snapname);
1167 1116          nvlist_t *snaps = fnvlist_alloc();
1168 1117  
1169 1118          fnvlist_add_boolean(snaps, longsnap);
1170 1119          strfree(longsnap);
1171 1120          err = dsl_dataset_snapshot(snaps, NULL, NULL);
1172 1121          fnvlist_free(snaps);

1173 1122          return (err);
1174 1123  }
1175 1124  
1176 1125  static void
1177 1126  dmu_objset_sync_dnodes(multilist_sublist_t *list, dmu_tx_t *tx)
1178 1127  {
1179 1128          dnode_t *dn;
1180 1129  
1181 1130          while ((dn = multilist_sublist_head(list)) != NULL) {
1182 1131                  ASSERT(dn->dn_object != DMU_META_DNODE_OBJECT);
1183 1132                  ASSERT(dn->dn_dbuf->db_data_pending);
1184 1133                  /*
1185 1134                   * Initialize dn_zio outside dnode_sync() because the
1186 1135                   * meta-dnode needs to set it ouside dnode_sync().
1187 1136                   */
1188 1137                  dn->dn_zio = dn->dn_dbuf->db_data_pending->dr_zio;
1189 1138                  ASSERT(dn->dn_zio);
1190 1139  
1191 1140                  ASSERT3U(dn->dn_nlevels, <=, DN_MAX_LEVELS);
1192 1141                  multilist_sublist_remove(list, dn);
1193 1142  
1194 1143                  multilist_t *newlist = dn->dn_objset->os_synced_dnodes;
1195 1144                  if (newlist != NULL) {
1196 1145                          (void) dnode_add_ref(dn, newlist);
1197 1146                          multilist_insert(newlist, dn);
1198 1147                  }
1199 1148  
1200 1149                  dnode_sync(dn, tx);
1201 1150          }
1202 1151  }
1203 1152  
1204 1153  /* ARGSUSED */
1205 1154  static void
1206 1155  dmu_objset_write_ready(zio_t *zio, arc_buf_t *abuf, void *arg)
1207 1156  {
1208 1157          blkptr_t *bp = zio->io_bp;
1209 1158          objset_t *os = arg;
1210 1159          dnode_phys_t *dnp = &os->os_phys->os_meta_dnode;
1211 1160  
1212 1161          ASSERT(!BP_IS_EMBEDDED(bp));
1213 1162          ASSERT3U(BP_GET_TYPE(bp), ==, DMU_OT_OBJSET);
1214 1163          ASSERT0(BP_GET_LEVEL(bp));
1215 1164  
1216 1165          /*
1217 1166           * Update rootbp fill count: it should be the number of objects
1218 1167           * allocated in the object set (not counting the "special"
1219 1168           * objects that are stored in the objset_phys_t -- the meta
1220 1169           * dnode and user/group accounting objects).
1221 1170           */
1222 1171          bp->blk_fill = 0;
1223 1172          for (int i = 0; i < dnp->dn_nblkptr; i++)
1224 1173                  bp->blk_fill += BP_GET_FILL(&dnp->dn_blkptr[i]);
1225 1174          if (os->os_dsl_dataset != NULL)
1226 1175                  rrw_enter(&os->os_dsl_dataset->ds_bp_rwlock, RW_WRITER, FTAG);
1227 1176          *os->os_rootbp = *bp;
1228 1177          if (os->os_dsl_dataset != NULL)
1229 1178                  rrw_exit(&os->os_dsl_dataset->ds_bp_rwlock, FTAG);
1230 1179  }
1231 1180  
1232 1181  /* ARGSUSED */
1233 1182  static void
1234 1183  dmu_objset_write_done(zio_t *zio, arc_buf_t *abuf, void *arg)
1235 1184  {
1236 1185          blkptr_t *bp = zio->io_bp;
1237 1186          blkptr_t *bp_orig = &zio->io_bp_orig;
1238 1187          objset_t *os = arg;
1239 1188  
1240 1189          if (zio->io_flags & ZIO_FLAG_IO_REWRITE) {
1241 1190                  ASSERT(BP_EQUAL(bp, bp_orig));
1242 1191          } else {
1243 1192                  dsl_dataset_t *ds = os->os_dsl_dataset;
1244 1193                  dmu_tx_t *tx = os->os_synctx;
1245 1194  
1246 1195                  (void) dsl_dataset_block_kill(ds, bp_orig, tx, B_TRUE);
1247 1196                  dsl_dataset_block_born(ds, bp, tx);
1248 1197          }
1249 1198          kmem_free(bp, sizeof (*bp));
1250 1199  }
1251 1200  
1252 1201  typedef struct sync_dnodes_arg {
1253 1202          multilist_t *sda_list;
1254 1203          int sda_sublist_idx;
1255 1204          multilist_t *sda_newlist;
1256 1205          dmu_tx_t *sda_tx;
1257 1206  } sync_dnodes_arg_t;
1258 1207  
1259 1208  static void
1260 1209  sync_dnodes_task(void *arg)
1261 1210  {
1262 1211          sync_dnodes_arg_t *sda = arg;
1263 1212  
1264 1213          multilist_sublist_t *ms =
1265 1214              multilist_sublist_lock(sda->sda_list, sda->sda_sublist_idx);
1266 1215  
1267 1216          dmu_objset_sync_dnodes(ms, sda->sda_tx);
1268 1217  
1269 1218          multilist_sublist_unlock(ms);
1270 1219  
1271 1220          kmem_free(sda, sizeof (*sda));
1272 1221  }
1273 1222  
1274 1223  
1275 1224  /* called from dsl */
1276 1225  void
1277 1226  dmu_objset_sync(objset_t *os, zio_t *pio, dmu_tx_t *tx)
1278 1227  {
1279 1228          int txgoff;
1280 1229          zbookmark_phys_t zb;
1281 1230          zio_prop_t zp;
1282 1231          zio_t *zio;
1283 1232          list_t *list;
1284 1233          dbuf_dirty_record_t *dr;
1285 1234          blkptr_t *blkptr_copy = kmem_alloc(sizeof (*os->os_rootbp), KM_SLEEP);
1286 1235          *blkptr_copy = *os->os_rootbp;
1287 1236  
1288 1237          dprintf_ds(os->os_dsl_dataset, "txg=%llu\n", tx->tx_txg);
1289 1238  
1290 1239          ASSERT(dmu_tx_is_syncing(tx));
1291 1240          /* XXX the write_done callback should really give us the tx... */
1292 1241          os->os_synctx = tx;
1293 1242  
1294 1243          if (os->os_dsl_dataset == NULL) {
1295 1244                  /*
1296 1245                   * This is the MOS.  If we have upgraded,
1297 1246                   * spa_max_replication() could change, so reset
1298 1247                   * os_copies here.
1299 1248                   */
1300 1249                  os->os_copies = spa_max_replication(os->os_spa);
1301 1250          }
1302 1251  
1303 1252          /*
1304 1253           * Create the root block IO
1305 1254           */

↓ open down ↓

133 lines elided

↑ open up ↑

1306 1255          SET_BOOKMARK(&zb, os->os_dsl_dataset ?
1307 1256              os->os_dsl_dataset->ds_object : DMU_META_OBJSET,
1308 1257              ZB_ROOT_OBJECT, ZB_ROOT_LEVEL, ZB_ROOT_BLKID);
1309 1258          arc_release(os->os_phys_buf, &os->os_phys_buf);
1310 1259  
1311 1260          dmu_write_policy(os, NULL, 0, 0, &zp);
1312 1261  
1313 1262          zio = arc_write(pio, os->os_spa, tx->tx_txg,
1314 1263              blkptr_copy, os->os_phys_buf, DMU_OS_IS_L2CACHEABLE(os),
1315 1264              &zp, dmu_objset_write_ready, NULL, NULL, dmu_objset_write_done,
1316      -            os, ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb);
     1265 +            os, ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb, NULL);
1317 1266  
1318 1267          /*
1319 1268           * Sync special dnodes - the parent IO for the sync is the root block
1320 1269           */
1321 1270          DMU_META_DNODE(os)->dn_zio = zio;
1322 1271          dnode_sync(DMU_META_DNODE(os), tx);
1323 1272  
1324 1273          os->os_phys->os_flags = os->os_flags;
1325 1274  
1326 1275          if (DMU_USERUSED_DNODE(os) &&

1327 1276              DMU_USERUSED_DNODE(os)->dn_type != DMU_OT_NONE) {
1328 1277                  DMU_USERUSED_DNODE(os)->dn_zio = zio;
1329 1278                  dnode_sync(DMU_USERUSED_DNODE(os), tx);
1330 1279                  DMU_GROUPUSED_DNODE(os)->dn_zio = zio;
1331 1280                  dnode_sync(DMU_GROUPUSED_DNODE(os), tx);
1332 1281          }
1333 1282  
1334 1283          txgoff = tx->tx_txg & TXG_MASK;
1335 1284  
1336 1285          if (dmu_objset_userused_enabled(os)) {
1337 1286                  /*
1338 1287                   * We must create the list here because it uses the
1339 1288                   * dn_dirty_link[] of this txg.  But it may already
1340 1289                   * exist because we call dsl_dataset_sync() twice per txg.
1341 1290                   */
1342 1291                  if (os->os_synced_dnodes == NULL) {
1343 1292                          os->os_synced_dnodes =
1344 1293                              multilist_create(sizeof (dnode_t),
1345 1294                              offsetof(dnode_t, dn_dirty_link[txgoff]),
1346 1295                              dnode_multilist_index_func);
1347 1296                  } else {
1348 1297                          ASSERT3U(os->os_synced_dnodes->ml_offset, ==,
1349 1298                              offsetof(dnode_t, dn_dirty_link[txgoff]));
1350 1299                  }
1351 1300          }
1352 1301  
1353 1302          for (int i = 0;
1354 1303              i < multilist_get_num_sublists(os->os_dirty_dnodes[txgoff]); i++) {
1355 1304                  sync_dnodes_arg_t *sda = kmem_alloc(sizeof (*sda), KM_SLEEP);
1356 1305                  sda->sda_list = os->os_dirty_dnodes[txgoff];
1357 1306                  sda->sda_sublist_idx = i;
1358 1307                  sda->sda_tx = tx;
1359 1308                  (void) taskq_dispatch(dmu_objset_pool(os)->dp_sync_taskq,
1360 1309                      sync_dnodes_task, sda, 0);
1361 1310                  /* callback frees sda */
1362 1311          }
1363 1312          taskq_wait(dmu_objset_pool(os)->dp_sync_taskq);
1364 1313  
1365 1314          list = &DMU_META_DNODE(os)->dn_dirty_records[txgoff];
1366 1315          while ((dr = list_head(list)) != NULL) {
1367 1316                  ASSERT0(dr->dr_dbuf->db_level);
1368 1317                  list_remove(list, dr);
1369 1318                  if (dr->dr_zio)
1370 1319                          zio_nowait(dr->dr_zio);
1371 1320          }
1372 1321  
1373 1322          /* Enable dnode backfill if enough objects have been freed. */
1374 1323          if (os->os_freed_dnodes >= dmu_rescan_dnode_threshold) {
1375 1324                  os->os_rescan_dnodes = B_TRUE;
1376 1325                  os->os_freed_dnodes = 0;
1377 1326          }
1378 1327  
1379 1328          /*
1380 1329           * Free intent log blocks up to this tx.
1381 1330           */
1382 1331          zil_sync(os->os_zil, tx);
1383 1332          os->os_phys->os_zil_header = os->os_zil_header;
1384 1333          zio_nowait(zio);
1385 1334  }
1386 1335  
1387 1336  boolean_t
1388 1337  dmu_objset_is_dirty(objset_t *os, uint64_t txg)
1389 1338  {
1390 1339          return (!multilist_is_empty(os->os_dirty_dnodes[txg & TXG_MASK]));
1391 1340  }
1392 1341  
1393 1342  static objset_used_cb_t *used_cbs[DMU_OST_NUMTYPES];
1394 1343  
1395 1344  void
1396 1345  dmu_objset_register_type(dmu_objset_type_t ost, objset_used_cb_t *cb)
1397 1346  {
1398 1347          used_cbs[ost] = cb;
1399 1348  }
1400 1349  
1401 1350  boolean_t
1402 1351  dmu_objset_userused_enabled(objset_t *os)
1403 1352  {
1404 1353          return (spa_version(os->os_spa) >= SPA_VERSION_USERSPACE &&
1405 1354              used_cbs[os->os_phys->os_type] != NULL &&
1406 1355              DMU_USERUSED_DNODE(os) != NULL);
1407 1356  }
1408 1357  
1409 1358  typedef struct userquota_node {
1410 1359          uint64_t uqn_id;
1411 1360          int64_t uqn_delta;
1412 1361          avl_node_t uqn_node;
1413 1362  } userquota_node_t;
1414 1363  
1415 1364  typedef struct userquota_cache {
1416 1365          avl_tree_t uqc_user_deltas;
1417 1366          avl_tree_t uqc_group_deltas;
1418 1367  } userquota_cache_t;
1419 1368  
1420 1369  static int
1421 1370  userquota_compare(const void *l, const void *r)
1422 1371  {
1423 1372          const userquota_node_t *luqn = l;
1424 1373          const userquota_node_t *ruqn = r;
1425 1374  
1426 1375          if (luqn->uqn_id < ruqn->uqn_id)
1427 1376                  return (-1);
1428 1377          if (luqn->uqn_id > ruqn->uqn_id)
1429 1378                  return (1);
1430 1379          return (0);
1431 1380  }
1432 1381  
1433 1382  static void
1434 1383  do_userquota_cacheflush(objset_t *os, userquota_cache_t *cache, dmu_tx_t *tx)
1435 1384  {
1436 1385          void *cookie;
1437 1386          userquota_node_t *uqn;
1438 1387  
1439 1388          ASSERT(dmu_tx_is_syncing(tx));
1440 1389  
1441 1390          cookie = NULL;
1442 1391          while ((uqn = avl_destroy_nodes(&cache->uqc_user_deltas,
1443 1392              &cookie)) != NULL) {
1444 1393                  /*
1445 1394                   * os_userused_lock protects against concurrent calls to
1446 1395                   * zap_increment_int().  It's needed because zap_increment_int()
1447 1396                   * is not thread-safe (i.e. not atomic).
1448 1397                   */
1449 1398                  mutex_enter(&os->os_userused_lock);
1450 1399                  VERIFY0(zap_increment_int(os, DMU_USERUSED_OBJECT,
1451 1400                      uqn->uqn_id, uqn->uqn_delta, tx));
1452 1401                  mutex_exit(&os->os_userused_lock);
1453 1402                  kmem_free(uqn, sizeof (*uqn));
1454 1403          }
1455 1404          avl_destroy(&cache->uqc_user_deltas);
1456 1405  
1457 1406          cookie = NULL;
1458 1407          while ((uqn = avl_destroy_nodes(&cache->uqc_group_deltas,
1459 1408              &cookie)) != NULL) {
1460 1409                  mutex_enter(&os->os_userused_lock);
1461 1410                  VERIFY0(zap_increment_int(os, DMU_GROUPUSED_OBJECT,
1462 1411                      uqn->uqn_id, uqn->uqn_delta, tx));
1463 1412                  mutex_exit(&os->os_userused_lock);
1464 1413                  kmem_free(uqn, sizeof (*uqn));
1465 1414          }
1466 1415          avl_destroy(&cache->uqc_group_deltas);
1467 1416  }
1468 1417  
1469 1418  static void
1470 1419  userquota_update_cache(avl_tree_t *avl, uint64_t id, int64_t delta)
1471 1420  {
1472 1421          userquota_node_t search = { .uqn_id = id };
1473 1422          avl_index_t idx;
1474 1423  
1475 1424          userquota_node_t *uqn = avl_find(avl, &search, &idx);
1476 1425          if (uqn == NULL) {
1477 1426                  uqn = kmem_zalloc(sizeof (*uqn), KM_SLEEP);
1478 1427                  uqn->uqn_id = id;
1479 1428                  avl_insert(avl, uqn, idx);
1480 1429          }
1481 1430          uqn->uqn_delta += delta;
1482 1431  }
1483 1432  
1484 1433  static void
1485 1434  do_userquota_update(userquota_cache_t *cache, uint64_t used, uint64_t flags,
1486 1435      uint64_t user, uint64_t group, boolean_t subtract)
1487 1436  {
1488 1437          if ((flags & DNODE_FLAG_USERUSED_ACCOUNTED)) {
1489 1438                  int64_t delta = DNODE_SIZE + used;
1490 1439                  if (subtract)
1491 1440                          delta = -delta;
1492 1441  
1493 1442                  userquota_update_cache(&cache->uqc_user_deltas, user, delta);
1494 1443                  userquota_update_cache(&cache->uqc_group_deltas, group, delta);
1495 1444          }
1496 1445  }
1497 1446  
1498 1447  typedef struct userquota_updates_arg {
1499 1448          objset_t *uua_os;
1500 1449          int uua_sublist_idx;
1501 1450          dmu_tx_t *uua_tx;
1502 1451  } userquota_updates_arg_t;
1503 1452  
1504 1453  static void
1505 1454  userquota_updates_task(void *arg)
1506 1455  {
1507 1456          userquota_updates_arg_t *uua = arg;
1508 1457          objset_t *os = uua->uua_os;
1509 1458          dmu_tx_t *tx = uua->uua_tx;
1510 1459          dnode_t *dn;
1511 1460          userquota_cache_t cache = { 0 };
1512 1461  
1513 1462          multilist_sublist_t *list =
1514 1463              multilist_sublist_lock(os->os_synced_dnodes, uua->uua_sublist_idx);
1515 1464  
1516 1465          ASSERT(multilist_sublist_head(list) == NULL ||
1517 1466              dmu_objset_userused_enabled(os));
1518 1467          avl_create(&cache.uqc_user_deltas, userquota_compare,
1519 1468              sizeof (userquota_node_t), offsetof(userquota_node_t, uqn_node));
1520 1469          avl_create(&cache.uqc_group_deltas, userquota_compare,
1521 1470              sizeof (userquota_node_t), offsetof(userquota_node_t, uqn_node));
1522 1471  
1523 1472          while ((dn = multilist_sublist_head(list)) != NULL) {
1524 1473                  int flags;
1525 1474                  ASSERT(!DMU_OBJECT_IS_SPECIAL(dn->dn_object));
1526 1475                  ASSERT(dn->dn_phys->dn_type == DMU_OT_NONE ||
1527 1476                      dn->dn_phys->dn_flags &
1528 1477                      DNODE_FLAG_USERUSED_ACCOUNTED);
1529 1478  
1530 1479                  flags = dn->dn_id_flags;
1531 1480                  ASSERT(flags);
1532 1481                  if (flags & DN_ID_OLD_EXIST)  {
1533 1482                          do_userquota_update(&cache,
1534 1483                              dn->dn_oldused, dn->dn_oldflags,
1535 1484                              dn->dn_olduid, dn->dn_oldgid, B_TRUE);
1536 1485                  }
1537 1486                  if (flags & DN_ID_NEW_EXIST) {
1538 1487                          do_userquota_update(&cache,
1539 1488                              DN_USED_BYTES(dn->dn_phys),
1540 1489                              dn->dn_phys->dn_flags,  dn->dn_newuid,
1541 1490                              dn->dn_newgid, B_FALSE);
1542 1491                  }
1543 1492  
1544 1493                  mutex_enter(&dn->dn_mtx);
1545 1494                  dn->dn_oldused = 0;
1546 1495                  dn->dn_oldflags = 0;
1547 1496                  if (dn->dn_id_flags & DN_ID_NEW_EXIST) {
1548 1497                          dn->dn_olduid = dn->dn_newuid;
1549 1498                          dn->dn_oldgid = dn->dn_newgid;
1550 1499                          dn->dn_id_flags |= DN_ID_OLD_EXIST;
1551 1500                          if (dn->dn_bonuslen == 0)
1552 1501                                  dn->dn_id_flags |= DN_ID_CHKED_SPILL;
1553 1502                          else
1554 1503                                  dn->dn_id_flags |= DN_ID_CHKED_BONUS;
1555 1504                  }
1556 1505                  dn->dn_id_flags &= ~(DN_ID_NEW_EXIST);
1557 1506                  mutex_exit(&dn->dn_mtx);
1558 1507  
1559 1508                  multilist_sublist_remove(list, dn);
1560 1509                  dnode_rele(dn, os->os_synced_dnodes);
1561 1510          }
1562 1511          do_userquota_cacheflush(os, &cache, tx);
1563 1512          multilist_sublist_unlock(list);
1564 1513          kmem_free(uua, sizeof (*uua));
1565 1514  }
1566 1515  
1567 1516  void
1568 1517  dmu_objset_do_userquota_updates(objset_t *os, dmu_tx_t *tx)
1569 1518  {
1570 1519          if (!dmu_objset_userused_enabled(os))
1571 1520                  return;
1572 1521  
1573 1522          /* Allocate the user/groupused objects if necessary. */
1574 1523          if (DMU_USERUSED_DNODE(os)->dn_type == DMU_OT_NONE) {
1575 1524                  VERIFY0(zap_create_claim(os,
1576 1525                      DMU_USERUSED_OBJECT,
1577 1526                      DMU_OT_USERGROUP_USED, DMU_OT_NONE, 0, tx));
1578 1527                  VERIFY0(zap_create_claim(os,
1579 1528                      DMU_GROUPUSED_OBJECT,
1580 1529                      DMU_OT_USERGROUP_USED, DMU_OT_NONE, 0, tx));
1581 1530          }
1582 1531  
1583 1532          for (int i = 0;
1584 1533              i < multilist_get_num_sublists(os->os_synced_dnodes); i++) {
1585 1534                  userquota_updates_arg_t *uua =
1586 1535                      kmem_alloc(sizeof (*uua), KM_SLEEP);
1587 1536                  uua->uua_os = os;
1588 1537                  uua->uua_sublist_idx = i;
1589 1538                  uua->uua_tx = tx;
1590 1539                  /* note: caller does taskq_wait() */
1591 1540                  (void) taskq_dispatch(dmu_objset_pool(os)->dp_sync_taskq,
1592 1541                      userquota_updates_task, uua, 0);
1593 1542                  /* callback frees uua */
1594 1543          }
1595 1544  }
1596 1545  
1597 1546  /*
1598 1547   * Returns a pointer to data to find uid/gid from
1599 1548   *
1600 1549   * If a dirty record for transaction group that is syncing can't
1601 1550   * be found then NULL is returned.  In the NULL case it is assumed
1602 1551   * the uid/gid aren't changing.
1603 1552   */
1604 1553  static void *
1605 1554  dmu_objset_userquota_find_data(dmu_buf_impl_t *db, dmu_tx_t *tx)
1606 1555  {
1607 1556          dbuf_dirty_record_t *dr, **drp;
1608 1557          void *data;
1609 1558  
1610 1559          if (db->db_dirtycnt == 0)
1611 1560                  return (db->db.db_data);  /* Nothing is changing */
1612 1561  
1613 1562          for (drp = &db->db_last_dirty; (dr = *drp) != NULL; drp = &dr->dr_next)
1614 1563                  if (dr->dr_txg == tx->tx_txg)
1615 1564                          break;
1616 1565  
1617 1566          if (dr == NULL) {
1618 1567                  data = NULL;
1619 1568          } else {
1620 1569                  dnode_t *dn;
1621 1570  
1622 1571                  DB_DNODE_ENTER(dr->dr_dbuf);
1623 1572                  dn = DB_DNODE(dr->dr_dbuf);
1624 1573  
1625 1574                  if (dn->dn_bonuslen == 0 &&
1626 1575                      dr->dr_dbuf->db_blkid == DMU_SPILL_BLKID)
1627 1576                          data = dr->dt.dl.dr_data->b_data;
1628 1577                  else
1629 1578                          data = dr->dt.dl.dr_data;
1630 1579  
1631 1580                  DB_DNODE_EXIT(dr->dr_dbuf);
1632 1581          }
1633 1582  
1634 1583          return (data);
1635 1584  }
1636 1585  
1637 1586  void
1638 1587  dmu_objset_userquota_get_ids(dnode_t *dn, boolean_t before, dmu_tx_t *tx)
1639 1588  {
1640 1589          objset_t *os = dn->dn_objset;
1641 1590          void *data = NULL;
1642 1591          dmu_buf_impl_t *db = NULL;
1643 1592          uint64_t *user = NULL;
1644 1593          uint64_t *group = NULL;
1645 1594          int flags = dn->dn_id_flags;
1646 1595          int error;
1647 1596          boolean_t have_spill = B_FALSE;
1648 1597  
1649 1598          if (!dmu_objset_userused_enabled(dn->dn_objset))
1650 1599                  return;
1651 1600  
1652 1601          if (before && (flags & (DN_ID_CHKED_BONUS|DN_ID_OLD_EXIST|
1653 1602              DN_ID_CHKED_SPILL)))
1654 1603                  return;
1655 1604  
1656 1605          if (before && dn->dn_bonuslen != 0)
1657 1606                  data = DN_BONUS(dn->dn_phys);
1658 1607          else if (!before && dn->dn_bonuslen != 0) {
1659 1608                  if (dn->dn_bonus) {
1660 1609                          db = dn->dn_bonus;
1661 1610                          mutex_enter(&db->db_mtx);
1662 1611                          data = dmu_objset_userquota_find_data(db, tx);
1663 1612                  } else {
1664 1613                          data = DN_BONUS(dn->dn_phys);
1665 1614                  }
1666 1615          } else if (dn->dn_bonuslen == 0 && dn->dn_bonustype == DMU_OT_SA) {
1667 1616                          int rf = 0;
1668 1617  
1669 1618                          if (RW_WRITE_HELD(&dn->dn_struct_rwlock))
1670 1619                                  rf |= DB_RF_HAVESTRUCT;
1671 1620                          error = dmu_spill_hold_by_dnode(dn,
1672 1621                              rf | DB_RF_MUST_SUCCEED,
1673 1622                              FTAG, (dmu_buf_t **)&db);
1674 1623                          ASSERT(error == 0);
1675 1624                          mutex_enter(&db->db_mtx);
1676 1625                          data = (before) ? db->db.db_data :
1677 1626                              dmu_objset_userquota_find_data(db, tx);
1678 1627                          have_spill = B_TRUE;
1679 1628          } else {
1680 1629                  mutex_enter(&dn->dn_mtx);
1681 1630                  dn->dn_id_flags |= DN_ID_CHKED_BONUS;
1682 1631                  mutex_exit(&dn->dn_mtx);
1683 1632                  return;
1684 1633          }
1685 1634  
1686 1635          if (before) {
1687 1636                  ASSERT(data);
1688 1637                  user = &dn->dn_olduid;
1689 1638                  group = &dn->dn_oldgid;
1690 1639          } else if (data) {
1691 1640                  user = &dn->dn_newuid;
1692 1641                  group = &dn->dn_newgid;
1693 1642          }
1694 1643  
1695 1644          /*
1696 1645           * Must always call the callback in case the object
1697 1646           * type has changed and that type isn't an object type to track
1698 1647           */
1699 1648          error = used_cbs[os->os_phys->os_type](dn->dn_bonustype, data,
1700 1649              user, group);
1701 1650  
1702 1651          /*
1703 1652           * Preserve existing uid/gid when the callback can't determine
1704 1653           * what the new uid/gid are and the callback returned EEXIST.
1705 1654           * The EEXIST error tells us to just use the existing uid/gid.
1706 1655           * If we don't know what the old values are then just assign
1707 1656           * them to 0, since that is a new file  being created.
1708 1657           */
1709 1658          if (!before && data == NULL && error == EEXIST) {
1710 1659                  if (flags & DN_ID_OLD_EXIST) {
1711 1660                          dn->dn_newuid = dn->dn_olduid;
1712 1661                          dn->dn_newgid = dn->dn_oldgid;
1713 1662                  } else {
1714 1663                          dn->dn_newuid = 0;
1715 1664                          dn->dn_newgid = 0;
1716 1665                  }
1717 1666                  error = 0;
1718 1667          }
1719 1668  
1720 1669          if (db)
1721 1670                  mutex_exit(&db->db_mtx);
1722 1671  
1723 1672          mutex_enter(&dn->dn_mtx);
1724 1673          if (error == 0 && before)
1725 1674                  dn->dn_id_flags |= DN_ID_OLD_EXIST;
1726 1675          if (error == 0 && !before)
1727 1676                  dn->dn_id_flags |= DN_ID_NEW_EXIST;
1728 1677  
1729 1678          if (have_spill) {
1730 1679                  dn->dn_id_flags |= DN_ID_CHKED_SPILL;
1731 1680          } else {
1732 1681                  dn->dn_id_flags |= DN_ID_CHKED_BONUS;
1733 1682          }
1734 1683          mutex_exit(&dn->dn_mtx);
1735 1684          if (have_spill)
1736 1685                  dmu_buf_rele((dmu_buf_t *)db, FTAG);
1737 1686  }
1738 1687  
1739 1688  boolean_t
1740 1689  dmu_objset_userspace_present(objset_t *os)
1741 1690  {
1742 1691          return (os->os_phys->os_flags &
1743 1692              OBJSET_FLAG_USERACCOUNTING_COMPLETE);
1744 1693  }
1745 1694  
1746 1695  int
1747 1696  dmu_objset_userspace_upgrade(objset_t *os)
1748 1697  {
1749 1698          uint64_t obj;
1750 1699          int err = 0;
1751 1700  
1752 1701          if (dmu_objset_userspace_present(os))
1753 1702                  return (0);
1754 1703          if (!dmu_objset_userused_enabled(os))
1755 1704                  return (SET_ERROR(ENOTSUP));
1756 1705          if (dmu_objset_is_snapshot(os))
1757 1706                  return (SET_ERROR(EINVAL));
1758 1707  
1759 1708          /*
1760 1709           * We simply need to mark every object dirty, so that it will be
1761 1710           * synced out and now accounted.  If this is called
1762 1711           * concurrently, or if we already did some work before crashing,
1763 1712           * that's fine, since we track each object's accounted state
1764 1713           * independently.
1765 1714           */
1766 1715  
1767 1716          for (obj = 0; err == 0; err = dmu_object_next(os, &obj, FALSE, 0)) {
1768 1717                  dmu_tx_t *tx;
1769 1718                  dmu_buf_t *db;
1770 1719                  int objerr;
1771 1720  
1772 1721                  if (issig(JUSTLOOKING) && issig(FORREAL))
1773 1722                          return (SET_ERROR(EINTR));
1774 1723  
1775 1724                  objerr = dmu_bonus_hold(os, obj, FTAG, &db);
1776 1725                  if (objerr != 0)
1777 1726                          continue;
1778 1727                  tx = dmu_tx_create(os);
1779 1728                  dmu_tx_hold_bonus(tx, obj);
1780 1729                  objerr = dmu_tx_assign(tx, TXG_WAIT);
1781 1730                  if (objerr != 0) {
1782 1731                          dmu_tx_abort(tx);
1783 1732                          continue;
1784 1733                  }
1785 1734                  dmu_buf_will_dirty(db, tx);
1786 1735                  dmu_buf_rele(db, FTAG);
1787 1736                  dmu_tx_commit(tx);
1788 1737          }
1789 1738  
1790 1739          os->os_flags |= OBJSET_FLAG_USERACCOUNTING_COMPLETE;
1791 1740          txg_wait_synced(dmu_objset_pool(os), 0);
1792 1741          return (0);
1793 1742  }
1794 1743  
1795 1744  void
1796 1745  dmu_objset_space(objset_t *os, uint64_t *refdbytesp, uint64_t *availbytesp,
1797 1746      uint64_t *usedobjsp, uint64_t *availobjsp)
1798 1747  {
1799 1748          dsl_dataset_space(os->os_dsl_dataset, refdbytesp, availbytesp,
1800 1749              usedobjsp, availobjsp);
1801 1750  }
1802 1751  
1803 1752  uint64_t
1804 1753  dmu_objset_fsid_guid(objset_t *os)
1805 1754  {
1806 1755          return (dsl_dataset_fsid_guid(os->os_dsl_dataset));
1807 1756  }
1808 1757  
1809 1758  void
1810 1759  dmu_objset_fast_stat(objset_t *os, dmu_objset_stats_t *stat)
1811 1760  {
1812 1761          stat->dds_type = os->os_phys->os_type;
1813 1762          if (os->os_dsl_dataset)
1814 1763                  dsl_dataset_fast_stat(os->os_dsl_dataset, stat);
1815 1764  }
1816 1765  
1817 1766  void
1818 1767  dmu_objset_stats(objset_t *os, nvlist_t *nv)
1819 1768  {
1820 1769          ASSERT(os->os_dsl_dataset ||
1821 1770              os->os_phys->os_type == DMU_OST_META);
1822 1771  
1823 1772          if (os->os_dsl_dataset != NULL)
1824 1773                  dsl_dataset_stats(os->os_dsl_dataset, nv);
1825 1774  
1826 1775          dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_TYPE,
1827 1776              os->os_phys->os_type);
1828 1777          dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_USERACCOUNTING,
1829 1778              dmu_objset_userspace_present(os));
1830 1779  }
1831 1780  
1832 1781  int
1833 1782  dmu_objset_is_snapshot(objset_t *os)
1834 1783  {
1835 1784          if (os->os_dsl_dataset != NULL)
1836 1785                  return (os->os_dsl_dataset->ds_is_snapshot);
1837 1786          else
1838 1787                  return (B_FALSE);
1839 1788  }
1840 1789  
1841 1790  int
1842 1791  dmu_snapshot_realname(objset_t *os, char *name, char *real, int maxlen,
1843 1792      boolean_t *conflict)
1844 1793  {
1845 1794          dsl_dataset_t *ds = os->os_dsl_dataset;

↓ open down ↓

519 lines elided

↑ open up ↑

1846 1795          uint64_t ignored;
1847 1796  
1848 1797          if (dsl_dataset_phys(ds)->ds_snapnames_zapobj == 0)
1849 1798                  return (SET_ERROR(ENOENT));
1850 1799  
1851 1800          return (zap_lookup_norm(ds->ds_dir->dd_pool->dp_meta_objset,
1852 1801              dsl_dataset_phys(ds)->ds_snapnames_zapobj, name, 8, 1, &ignored,
1853 1802              MT_NORMALIZE, real, maxlen, conflict));
1854 1803  }
1855 1804  
     1805 +int
     1806 +dmu_clone_list_next(objset_t *os, int len, char *name,
     1807 +    uint64_t *idp, uint64_t *offp)
     1808 +{
     1809 +        dsl_dataset_t *ds = os->os_dsl_dataset, *clone;
     1810 +        zap_cursor_t cursor;
     1811 +        zap_attribute_t attr;
     1812 +        char buf[MAXNAMELEN];
     1813 +
     1814 +        ASSERT(dsl_pool_config_held(dmu_objset_pool(os)));
     1815 +
     1816 +        if (dsl_dataset_phys(ds)->ds_next_clones_obj == 0)
     1817 +                return (SET_ERROR(ENOENT));
     1818 +
     1819 +        zap_cursor_init_serialized(&cursor,
     1820 +            ds->ds_dir->dd_pool->dp_meta_objset,
     1821 +            dsl_dataset_phys(ds)->ds_next_clones_obj, *offp);
     1822 +
     1823 +        if (zap_cursor_retrieve(&cursor, &attr) != 0) {
     1824 +                zap_cursor_fini(&cursor);
     1825 +                return (SET_ERROR(ENOENT));
     1826 +        }
     1827 +
     1828 +        VERIFY0(dsl_dataset_hold_obj(ds->ds_dir->dd_pool,
     1829 +            attr.za_first_integer, FTAG, &clone));
     1830 +
     1831 +        dsl_dir_name(clone->ds_dir, buf);
     1832 +
     1833 +        dsl_dataset_rele(clone, FTAG);
     1834 +
     1835 +        if (strlen(buf) >= len) {
     1836 +                zap_cursor_fini(&cursor);
     1837 +                return (SET_ERROR(ENAMETOOLONG));
     1838 +        }
     1839 +
     1840 +        (void) strcpy(name, buf);
     1841 +        if (idp != NULL)
     1842 +                *idp = attr.za_first_integer;
     1843 +
     1844 +        zap_cursor_advance(&cursor);
     1845 +        *offp = zap_cursor_serialize(&cursor);
     1846 +        zap_cursor_fini(&cursor);
     1847 +
     1848 +        return (0);
     1849 +}
     1850 +
1856 1851  int
1857 1852  dmu_snapshot_list_next(objset_t *os, int namelen, char *name,
1858 1853      uint64_t *idp, uint64_t *offp, boolean_t *case_conflict)
1859 1854  {
1860 1855          dsl_dataset_t *ds = os->os_dsl_dataset;
1861 1856          zap_cursor_t cursor;
1862 1857          zap_attribute_t attr;
1863 1858  
1864 1859          ASSERT(dsl_pool_config_held(dmu_objset_pool(os)));
1865 1860

1866 1861          if (dsl_dataset_phys(ds)->ds_snapnames_zapobj == 0)
1867 1862                  return (SET_ERROR(ENOENT));
1868 1863  
1869 1864          zap_cursor_init_serialized(&cursor,
1870 1865              ds->ds_dir->dd_pool->dp_meta_objset,
1871 1866              dsl_dataset_phys(ds)->ds_snapnames_zapobj, *offp);
1872 1867  
1873 1868          if (zap_cursor_retrieve(&cursor, &attr) != 0) {
1874 1869                  zap_cursor_fini(&cursor);
1875 1870                  return (SET_ERROR(ENOENT));
1876 1871          }
1877 1872  
1878 1873          if (strlen(attr.za_name) + 1 > namelen) {
1879 1874                  zap_cursor_fini(&cursor);
1880 1875                  return (SET_ERROR(ENAMETOOLONG));
1881 1876          }
1882 1877  
1883 1878          (void) strcpy(name, attr.za_name);
1884 1879          if (idp)
1885 1880                  *idp = attr.za_first_integer;
1886 1881          if (case_conflict)
1887 1882                  *case_conflict = attr.za_normalization_conflict;
1888 1883          zap_cursor_advance(&cursor);
1889 1884          *offp = zap_cursor_serialize(&cursor);
1890 1885          zap_cursor_fini(&cursor);
1891 1886  
1892 1887          return (0);
1893 1888  }
1894 1889  
1895 1890  int
1896 1891  dmu_dir_list_next(objset_t *os, int namelen, char *name,
1897 1892      uint64_t *idp, uint64_t *offp)
1898 1893  {
1899 1894          dsl_dir_t *dd = os->os_dsl_dataset->ds_dir;
1900 1895          zap_cursor_t cursor;
1901 1896          zap_attribute_t attr;
1902 1897  
1903 1898          /* there is no next dir on a snapshot! */
1904 1899          if (os->os_dsl_dataset->ds_object !=
1905 1900              dsl_dir_phys(dd)->dd_head_dataset_obj)
1906 1901                  return (SET_ERROR(ENOENT));
1907 1902  
1908 1903          zap_cursor_init_serialized(&cursor,
1909 1904              dd->dd_pool->dp_meta_objset,
1910 1905              dsl_dir_phys(dd)->dd_child_dir_zapobj, *offp);
1911 1906  
1912 1907          if (zap_cursor_retrieve(&cursor, &attr) != 0) {
1913 1908                  zap_cursor_fini(&cursor);
1914 1909                  return (SET_ERROR(ENOENT));
1915 1910          }
1916 1911  
1917 1912          if (strlen(attr.za_name) + 1 > namelen) {
1918 1913                  zap_cursor_fini(&cursor);
1919 1914                  return (SET_ERROR(ENAMETOOLONG));
1920 1915          }
1921 1916  
1922 1917          (void) strcpy(name, attr.za_name);
1923 1918          if (idp)
1924 1919                  *idp = attr.za_first_integer;
1925 1920          zap_cursor_advance(&cursor);
1926 1921          *offp = zap_cursor_serialize(&cursor);
1927 1922          zap_cursor_fini(&cursor);
1928 1923  
1929 1924          return (0);
1930 1925  }
1931 1926  
1932 1927  typedef struct dmu_objset_find_ctx {
1933 1928          taskq_t         *dc_tq;
1934 1929          dsl_pool_t      *dc_dp;
1935 1930          uint64_t        dc_ddobj;
1936 1931          char            *dc_ddname; /* last component of ddobj's name */
1937 1932          int             (*dc_func)(dsl_pool_t *, dsl_dataset_t *, void *);
1938 1933          void            *dc_arg;
1939 1934          int             dc_flags;
1940 1935          kmutex_t        *dc_error_lock;
1941 1936          int             *dc_error;
1942 1937  } dmu_objset_find_ctx_t;
1943 1938  
1944 1939  static void
1945 1940  dmu_objset_find_dp_impl(dmu_objset_find_ctx_t *dcp)
1946 1941  {
1947 1942          dsl_pool_t *dp = dcp->dc_dp;
1948 1943          dsl_dir_t *dd;
1949 1944          dsl_dataset_t *ds;
1950 1945          zap_cursor_t zc;
1951 1946          zap_attribute_t *attr;
1952 1947          uint64_t thisobj;
1953 1948          int err = 0;
1954 1949  
1955 1950          /* don't process if there already was an error */
1956 1951          if (*dcp->dc_error != 0)
1957 1952                  goto out;
1958 1953  
1959 1954          /*
1960 1955           * Note: passing the name (dc_ddname) here is optional, but it
1961 1956           * improves performance because we don't need to call
1962 1957           * zap_value_search() to determine the name.
1963 1958           */
1964 1959          err = dsl_dir_hold_obj(dp, dcp->dc_ddobj, dcp->dc_ddname, FTAG, &dd);
1965 1960          if (err != 0)
1966 1961                  goto out;
1967 1962  
1968 1963          /* Don't visit hidden ($MOS & $ORIGIN) objsets. */
1969 1964          if (dd->dd_myname[0] == '$') {
1970 1965                  dsl_dir_rele(dd, FTAG);
1971 1966                  goto out;
1972 1967          }
1973 1968  
1974 1969          thisobj = dsl_dir_phys(dd)->dd_head_dataset_obj;
1975 1970          attr = kmem_alloc(sizeof (zap_attribute_t), KM_SLEEP);
1976 1971  
1977 1972          /*
1978 1973           * Iterate over all children.
1979 1974           */
1980 1975          if (dcp->dc_flags & DS_FIND_CHILDREN) {
1981 1976                  for (zap_cursor_init(&zc, dp->dp_meta_objset,
1982 1977                      dsl_dir_phys(dd)->dd_child_dir_zapobj);
1983 1978                      zap_cursor_retrieve(&zc, attr) == 0;
1984 1979                      (void) zap_cursor_advance(&zc)) {
1985 1980                          ASSERT3U(attr->za_integer_length, ==,
1986 1981                              sizeof (uint64_t));
1987 1982                          ASSERT3U(attr->za_num_integers, ==, 1);
1988 1983  
1989 1984                          dmu_objset_find_ctx_t *child_dcp =
1990 1985                              kmem_alloc(sizeof (*child_dcp), KM_SLEEP);
1991 1986                          *child_dcp = *dcp;
1992 1987                          child_dcp->dc_ddobj = attr->za_first_integer;
1993 1988                          child_dcp->dc_ddname = spa_strdup(attr->za_name);
1994 1989                          if (dcp->dc_tq != NULL)
1995 1990                                  (void) taskq_dispatch(dcp->dc_tq,
1996 1991                                      dmu_objset_find_dp_cb, child_dcp, TQ_SLEEP);
1997 1992                          else
1998 1993                                  dmu_objset_find_dp_impl(child_dcp);
1999 1994                  }
2000 1995                  zap_cursor_fini(&zc);
2001 1996          }
2002 1997  
2003 1998          /*
2004 1999           * Iterate over all snapshots.
2005 2000           */
2006 2001          if (dcp->dc_flags & DS_FIND_SNAPSHOTS) {
2007 2002                  dsl_dataset_t *ds;
2008 2003                  err = dsl_dataset_hold_obj(dp, thisobj, FTAG, &ds);
2009 2004  
2010 2005                  if (err == 0) {
2011 2006                          uint64_t snapobj;
2012 2007  
2013 2008                          snapobj = dsl_dataset_phys(ds)->ds_snapnames_zapobj;
2014 2009                          dsl_dataset_rele(ds, FTAG);
2015 2010  
2016 2011                          for (zap_cursor_init(&zc, dp->dp_meta_objset, snapobj);
2017 2012                              zap_cursor_retrieve(&zc, attr) == 0;
2018 2013                              (void) zap_cursor_advance(&zc)) {
2019 2014                                  ASSERT3U(attr->za_integer_length, ==,
2020 2015                                      sizeof (uint64_t));
2021 2016                                  ASSERT3U(attr->za_num_integers, ==, 1);
2022 2017  
2023 2018                                  err = dsl_dataset_hold_obj(dp,
2024 2019                                      attr->za_first_integer, FTAG, &ds);
2025 2020                                  if (err != 0)
2026 2021                                          break;
2027 2022                                  err = dcp->dc_func(dp, ds, dcp->dc_arg);
2028 2023                                  dsl_dataset_rele(ds, FTAG);
2029 2024                                  if (err != 0)
2030 2025                                          break;
2031 2026                          }
2032 2027                          zap_cursor_fini(&zc);
2033 2028                  }
2034 2029          }
2035 2030  
2036 2031          kmem_free(attr, sizeof (zap_attribute_t));
2037 2032  
2038 2033          if (err != 0) {
2039 2034                  dsl_dir_rele(dd, FTAG);
2040 2035                  goto out;
2041 2036          }
2042 2037  
2043 2038          /*
2044 2039           * Apply to self.
2045 2040           */
2046 2041          err = dsl_dataset_hold_obj(dp, thisobj, FTAG, &ds);
2047 2042  
2048 2043          /*
2049 2044           * Note: we hold the dir while calling dsl_dataset_hold_obj() so
2050 2045           * that the dir will remain cached, and we won't have to re-instantiate
2051 2046           * it (which could be expensive due to finding its name via
2052 2047           * zap_value_search()).
2053 2048           */
2054 2049          dsl_dir_rele(dd, FTAG);
2055 2050          if (err != 0)
2056 2051                  goto out;
2057 2052          err = dcp->dc_func(dp, ds, dcp->dc_arg);
2058 2053          dsl_dataset_rele(ds, FTAG);
2059 2054  
2060 2055  out:
2061 2056          if (err != 0) {
2062 2057                  mutex_enter(dcp->dc_error_lock);
2063 2058                  /* only keep first error */
2064 2059                  if (*dcp->dc_error == 0)
2065 2060                          *dcp->dc_error = err;
2066 2061                  mutex_exit(dcp->dc_error_lock);
2067 2062          }
2068 2063  
2069 2064          if (dcp->dc_ddname != NULL)
2070 2065                  spa_strfree(dcp->dc_ddname);
2071 2066          kmem_free(dcp, sizeof (*dcp));
2072 2067  }
2073 2068  
2074 2069  static void
2075 2070  dmu_objset_find_dp_cb(void *arg)
2076 2071  {
2077 2072          dmu_objset_find_ctx_t *dcp = arg;
2078 2073          dsl_pool_t *dp = dcp->dc_dp;
2079 2074  
2080 2075          /*
2081 2076           * We need to get a pool_config_lock here, as there are several
2082 2077           * asssert(pool_config_held) down the stack. Getting a lock via
2083 2078           * dsl_pool_config_enter is risky, as it might be stalled by a
2084 2079           * pending writer. This would deadlock, as the write lock can
2085 2080           * only be granted when our parent thread gives up the lock.
2086 2081           * The _prio interface gives us priority over a pending writer.
2087 2082           */
2088 2083          dsl_pool_config_enter_prio(dp, FTAG);
2089 2084  
2090 2085          dmu_objset_find_dp_impl(dcp);
2091 2086  
2092 2087          dsl_pool_config_exit(dp, FTAG);
2093 2088  }
2094 2089  
2095 2090  /*
2096 2091   * Find objsets under and including ddobj, call func(ds) on each.
2097 2092   * The order for the enumeration is completely undefined.
2098 2093   * func is called with dsl_pool_config held.
2099 2094   */
2100 2095  int
2101 2096  dmu_objset_find_dp(dsl_pool_t *dp, uint64_t ddobj,
2102 2097      int func(dsl_pool_t *, dsl_dataset_t *, void *), void *arg, int flags)
2103 2098  {
2104 2099          int error = 0;
2105 2100          taskq_t *tq = NULL;
2106 2101          int ntasks;
2107 2102          dmu_objset_find_ctx_t *dcp;
2108 2103          kmutex_t err_lock;
2109 2104  
2110 2105          mutex_init(&err_lock, NULL, MUTEX_DEFAULT, NULL);
2111 2106          dcp = kmem_alloc(sizeof (*dcp), KM_SLEEP);
2112 2107          dcp->dc_tq = NULL;
2113 2108          dcp->dc_dp = dp;
2114 2109          dcp->dc_ddobj = ddobj;
2115 2110          dcp->dc_ddname = NULL;
2116 2111          dcp->dc_func = func;
2117 2112          dcp->dc_arg = arg;
2118 2113          dcp->dc_flags = flags;
2119 2114          dcp->dc_error_lock = &err_lock;
2120 2115          dcp->dc_error = &error;
2121 2116  
2122 2117          if ((flags & DS_FIND_SERIALIZE) || dsl_pool_config_held_writer(dp)) {
2123 2118                  /*
2124 2119                   * In case a write lock is held we can't make use of
2125 2120                   * parallelism, as down the stack of the worker threads
2126 2121                   * the lock is asserted via dsl_pool_config_held.
2127 2122                   * In case of a read lock this is solved by getting a read
2128 2123                   * lock in each worker thread, which isn't possible in case
2129 2124                   * of a writer lock. So we fall back to the synchronous path
2130 2125                   * here.
2131 2126                   * In the future it might be possible to get some magic into
2132 2127                   * dsl_pool_config_held in a way that it returns true for
2133 2128                   * the worker threads so that a single lock held from this
2134 2129                   * thread suffices. For now, stay single threaded.
2135 2130                   */
2136 2131                  dmu_objset_find_dp_impl(dcp);
2137 2132                  mutex_destroy(&err_lock);
2138 2133  
2139 2134                  return (error);
2140 2135          }
2141 2136  
2142 2137          ntasks = dmu_find_threads;
2143 2138          if (ntasks == 0)
2144 2139                  ntasks = vdev_count_leaves(dp->dp_spa) * 4;
2145 2140          tq = taskq_create("dmu_objset_find", ntasks, minclsyspri, ntasks,
2146 2141              INT_MAX, 0);
2147 2142          if (tq == NULL) {
2148 2143                  kmem_free(dcp, sizeof (*dcp));
2149 2144                  mutex_destroy(&err_lock);
2150 2145  
2151 2146                  return (SET_ERROR(ENOMEM));
2152 2147          }
2153 2148          dcp->dc_tq = tq;
2154 2149  
2155 2150          /* dcp will be freed by task */
2156 2151          (void) taskq_dispatch(tq, dmu_objset_find_dp_cb, dcp, TQ_SLEEP);
2157 2152  
2158 2153          /*
2159 2154           * PORTING: this code relies on the property of taskq_wait to wait
2160 2155           * until no more tasks are queued and no more tasks are active. As
2161 2156           * we always queue new tasks from within other tasks, task_wait
2162 2157           * reliably waits for the full recursion to finish, even though we
2163 2158           * enqueue new tasks after taskq_wait has been called.
2164 2159           * On platforms other than illumos, taskq_wait may not have this
2165 2160           * property.
2166 2161           */
2167 2162          taskq_wait(tq);
2168 2163          taskq_destroy(tq);
2169 2164          mutex_destroy(&err_lock);
2170 2165  
2171 2166          return (error);
2172 2167  }
2173 2168  
2174 2169  /*
2175 2170   * Find all objsets under name, and for each, call 'func(child_name, arg)'.
2176 2171   * The dp_config_rwlock must not be held when this is called, and it
2177 2172   * will not be held when the callback is called.
2178 2173   * Therefore this function should only be used when the pool is not changing
2179 2174   * (e.g. in syncing context), or the callback can deal with the possible races.
2180 2175   */
2181 2176  static int
2182 2177  dmu_objset_find_impl(spa_t *spa, const char *name,
2183 2178      int func(const char *, void *), void *arg, int flags)
2184 2179  {
2185 2180          dsl_dir_t *dd;
2186 2181          dsl_pool_t *dp = spa_get_dsl(spa);
2187 2182          dsl_dataset_t *ds;
2188 2183          zap_cursor_t zc;
2189 2184          zap_attribute_t *attr;
2190 2185          char *child;
2191 2186          uint64_t thisobj;
2192 2187          int err;
2193 2188  
2194 2189          dsl_pool_config_enter(dp, FTAG);
2195 2190  
2196 2191          err = dsl_dir_hold(dp, name, FTAG, &dd, NULL);
2197 2192          if (err != 0) {
2198 2193                  dsl_pool_config_exit(dp, FTAG);
2199 2194                  return (err);
2200 2195          }
2201 2196  
2202 2197          /* Don't visit hidden ($MOS & $ORIGIN) objsets. */
2203 2198          if (dd->dd_myname[0] == '$') {
2204 2199                  dsl_dir_rele(dd, FTAG);
2205 2200                  dsl_pool_config_exit(dp, FTAG);
2206 2201                  return (0);
2207 2202          }
2208 2203  
2209 2204          thisobj = dsl_dir_phys(dd)->dd_head_dataset_obj;
2210 2205          attr = kmem_alloc(sizeof (zap_attribute_t), KM_SLEEP);
2211 2206  
2212 2207          /*
2213 2208           * Iterate over all children.
2214 2209           */
2215 2210          if (flags & DS_FIND_CHILDREN) {
2216 2211                  for (zap_cursor_init(&zc, dp->dp_meta_objset,
2217 2212                      dsl_dir_phys(dd)->dd_child_dir_zapobj);
2218 2213                      zap_cursor_retrieve(&zc, attr) == 0;
2219 2214                      (void) zap_cursor_advance(&zc)) {
2220 2215                          ASSERT3U(attr->za_integer_length, ==,
2221 2216                              sizeof (uint64_t));
2222 2217                          ASSERT3U(attr->za_num_integers, ==, 1);
2223 2218  
2224 2219                          child = kmem_asprintf("%s/%s", name, attr->za_name);
2225 2220                          dsl_pool_config_exit(dp, FTAG);
2226 2221                          err = dmu_objset_find_impl(spa, child,
2227 2222                              func, arg, flags);
2228 2223                          dsl_pool_config_enter(dp, FTAG);
2229 2224                          strfree(child);
2230 2225                          if (err != 0)
2231 2226                                  break;
2232 2227                  }
2233 2228                  zap_cursor_fini(&zc);
2234 2229  
2235 2230                  if (err != 0) {
2236 2231                          dsl_dir_rele(dd, FTAG);
2237 2232                          dsl_pool_config_exit(dp, FTAG);
2238 2233                          kmem_free(attr, sizeof (zap_attribute_t));
2239 2234                          return (err);
2240 2235                  }
2241 2236          }
2242 2237  
2243 2238          /*
2244 2239           * Iterate over all snapshots.
2245 2240           */
2246 2241          if (flags & DS_FIND_SNAPSHOTS) {
2247 2242                  err = dsl_dataset_hold_obj(dp, thisobj, FTAG, &ds);
2248 2243  
2249 2244                  if (err == 0) {
2250 2245                          uint64_t snapobj;
2251 2246  
2252 2247                          snapobj = dsl_dataset_phys(ds)->ds_snapnames_zapobj;
2253 2248                          dsl_dataset_rele(ds, FTAG);
2254 2249  
2255 2250                          for (zap_cursor_init(&zc, dp->dp_meta_objset, snapobj);
2256 2251                              zap_cursor_retrieve(&zc, attr) == 0;
2257 2252                              (void) zap_cursor_advance(&zc)) {
2258 2253                                  ASSERT3U(attr->za_integer_length, ==,
2259 2254                                      sizeof (uint64_t));
2260 2255                                  ASSERT3U(attr->za_num_integers, ==, 1);
2261 2256  
2262 2257                                  child = kmem_asprintf("%s@%s",
2263 2258                                      name, attr->za_name);
2264 2259                                  dsl_pool_config_exit(dp, FTAG);
2265 2260                                  err = func(child, arg);
2266 2261                                  dsl_pool_config_enter(dp, FTAG);
2267 2262                                  strfree(child);
2268 2263                                  if (err != 0)
2269 2264                                          break;
2270 2265                          }
2271 2266                          zap_cursor_fini(&zc);
2272 2267                  }
2273 2268          }
2274 2269  
2275 2270          dsl_dir_rele(dd, FTAG);
2276 2271          kmem_free(attr, sizeof (zap_attribute_t));
2277 2272          dsl_pool_config_exit(dp, FTAG);
2278 2273  
2279 2274          if (err != 0)
2280 2275                  return (err);
2281 2276  
2282 2277          /* Apply to self. */
2283 2278          return (func(name, arg));
2284 2279  }
2285 2280  
2286 2281  /*
2287 2282   * See comment above dmu_objset_find_impl().
2288 2283   */
2289 2284  int
2290 2285  dmu_objset_find(char *name, int func(const char *, void *), void *arg,
2291 2286      int flags)
2292 2287  {
2293 2288          spa_t *spa;
2294 2289          int error;
2295 2290  
2296 2291          error = spa_open(name, &spa, FTAG);
2297 2292          if (error != 0)
2298 2293                  return (error);
2299 2294          error = dmu_objset_find_impl(spa, name, func, arg, flags);
2300 2295          spa_close(spa, FTAG);
2301 2296          return (error);
2302 2297  }
2303 2298  
2304 2299  void
2305 2300  dmu_objset_set_user(objset_t *os, void *user_ptr)
2306 2301  {
2307 2302          ASSERT(MUTEX_HELD(&os->os_user_ptr_lock));
2308 2303          os->os_user_ptr = user_ptr;
2309 2304  }
2310 2305  
2311 2306  void *
2312 2307  dmu_objset_get_user(objset_t *os)
2313 2308  {
2314 2309          ASSERT(MUTEX_HELD(&os->os_user_ptr_lock));
2315 2310          return (os->os_user_ptr);
2316 2311  }
2317 2312  
2318 2313  /*
2319 2314   * Determine name of filesystem, given name of snapshot.
2320 2315   * buf must be at least ZFS_MAX_DATASET_NAME_LEN bytes
2321 2316   */
2322 2317  int
2323 2318  dmu_fsname(const char *snapname, char *buf)
2324 2319  {
2325 2320          char *atp = strchr(snapname, '@');
2326 2321          if (atp == NULL)
2327 2322                  return (SET_ERROR(EINVAL));
2328 2323          if (atp - snapname >= ZFS_MAX_DATASET_NAME_LEN)
2329 2324                  return (SET_ERROR(ENAMETOOLONG));
2330 2325          (void) strlcpy(buf, snapname, atp - snapname + 1);
2331 2326          return (0);
2332 2327  }
2333 2328  
2334 2329  /*
2335 2330   * Call when we think we're going to write/free space in open context to track
2336 2331   * the amount of dirty data in the open txg, which is also the amount
2337 2332   * of memory that can not be evicted until this txg syncs.
2338 2333   */
2339 2334  void
2340 2335  dmu_objset_willuse_space(objset_t *os, int64_t space, dmu_tx_t *tx)
2341 2336  {
2342 2337          dsl_dataset_t *ds = os->os_dsl_dataset;
2343 2338          int64_t aspace = spa_get_worst_case_asize(os->os_spa, space);
2344 2339  
2345 2340          if (ds != NULL) {
2346 2341                  dsl_dir_willuse_space(ds->ds_dir, aspace, tx);
2347 2342                  dsl_pool_dirty_space(dmu_tx_pool(tx), space, tx);
2348 2343          }
2349 2344  }

↓ open down ↓

484 lines elided

↑ open up ↑

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX