Print this page
    
NEX-19394 backport 9337 zfs get all is slow due to uncached metadata
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
 Conflicts:
  usr/src/uts/common/fs/zfs/dbuf.c
  usr/src/uts/common/fs/zfs/dmu.c
  usr/src/uts/common/fs/zfs/sys/dmu_objset.h
NEX-3214 remove cos object type from dmu.h
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-5366 Race between unique_insert() and unique_remove() causes ZFS fsid change
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Dan Vatca <dan.vatca@gmail.com>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5064 On-demand trim should store operation start and stop time
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4934 Add capability to remove special vdev
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
5987 zfs prefetch code needs work
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
5269 zpool import slow
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5692 expose the number of hole blocks in a file
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-3669 Faults for fans that don't exist
Reviewed by: Jeffry Molanus <jeffry.molanus@nexenta.com>
NEX-3891 Hide the snapshots that belong to in-kernel autosnap-service
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
NEX-3558 KRRP Integration
NEX-3212 remove vdev prop object type from dmu.h
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Issue #40: ZDB shouldn't crash with new code
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
Fixup merge results
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties
Bug 10481 - Dry run option in 'zfs send' isn't the same as in NexentaStor 3.1
    
      
        | Split | 
	Close | 
      
      | Expand all | 
      | Collapse all | 
    
    
          --- old/usr/src/uts/common/fs/zfs/sys/dmu.h
          +++ new/usr/src/uts/common/fs/zfs/sys/dmu.h
   1    1  /*
   2    2   * CDDL HEADER START
   3    3   *
   4    4   * The contents of this file are subject to the terms of the
   5    5   * Common Development and Distribution License (the "License").
   6    6   * You may not use this file except in compliance with the License.
   7    7   *
   8    8   * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
   9    9   * or http://www.opensolaris.org/os/licensing.
  10   10   * See the License for the specific language governing permissions
  11   11   * and limitations under the License.
  12   12   *
  13   13   * When distributing Covered Code, include this CDDL HEADER in each
  14   14   * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  
    | 
      ↓ open down ↓ | 
    14 lines elided | 
    
      ↑ open up ↑ | 
  
  15   15   * If applicable, add the following below this CDDL HEADER, with the
  16   16   * fields enclosed by brackets "[]" replaced with your own identifying
  17   17   * information: Portions Copyright [yyyy] [name of copyright owner]
  18   18   *
  19   19   * CDDL HEADER END
  20   20   */
  21   21  
  22   22  /*
  23   23   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  24   24   * Copyright (c) 2011, 2017 by Delphix. All rights reserved.
  25      - * Copyright 2011 Nexenta Systems, Inc. All rights reserved.
       25 + * Copyright 2016 Nexenta Systems, Inc. All rights reserved.
  26   26   * Copyright (c) 2012, Joyent, Inc. All rights reserved.
  27   27   * Copyright 2013 DEY Storage Systems, Inc.
  28   28   * Copyright 2014 HybridCluster. All rights reserved.
  29   29   * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
  30   30   * Copyright 2013 Saso Kiselkov. All rights reserved.
  31   31   * Copyright (c) 2014 Integros [integros.com]
  32   32   */
  33   33  
  34   34  /* Portions Copyright 2010 Robert Milkowski */
  35   35  
  36   36  #ifndef _SYS_DMU_H
  37   37  #define _SYS_DMU_H
  38   38  
  39   39  /*
  40   40   * This file describes the interface that the DMU provides for its
  41   41   * consumers.
  42   42   *
  43   43   * The DMU also interacts with the SPA.  That interface is described in
  44   44   * dmu_spa.h.
  45   45   */
  46   46  
  47   47  #include <sys/zfs_context.h>
  48   48  #include <sys/inttypes.h>
  49   49  #include <sys/cred.h>
  50   50  #include <sys/fs/zfs.h>
  51   51  #include <sys/zio_compress.h>
  52   52  #include <sys/zio_priority.h>
  53   53  
  54   54  #ifdef  __cplusplus
  55   55  extern "C" {
  56   56  #endif
  57   57  
  58   58  struct uio;
  59   59  struct xuio;
  60   60  struct page;
  61   61  struct vnode;
  62   62  struct spa;
  63   63  struct zilog;
  64   64  struct zio;
  65   65  struct blkptr;
  66   66  struct zap_cursor;
  67   67  struct dsl_dataset;
  68   68  struct dsl_pool;
  69   69  struct dnode;
  70   70  struct drr_begin;
  71   71  struct drr_end;
  72   72  struct zbookmark_phys;
  73   73  struct spa;
  74   74  struct nvlist;
  75   75  struct arc_buf;
  76   76  struct zio_prop;
  77   77  struct sa_handle;
  78   78  
  79   79  typedef struct objset objset_t;
  80   80  typedef struct dmu_tx dmu_tx_t;
  81   81  typedef struct dsl_dir dsl_dir_t;
  82   82  typedef struct dnode dnode_t;
  83   83  
  84   84  typedef enum dmu_object_byteswap {
  85   85          DMU_BSWAP_UINT8,
  86   86          DMU_BSWAP_UINT16,
  87   87          DMU_BSWAP_UINT32,
  88   88          DMU_BSWAP_UINT64,
  89   89          DMU_BSWAP_ZAP,
  90   90          DMU_BSWAP_DNODE,
  91   91          DMU_BSWAP_OBJSET,
  92   92          DMU_BSWAP_ZNODE,
  93   93          DMU_BSWAP_OLDACL,
  94   94          DMU_BSWAP_ACL,
  95   95          /*
  96   96           * Allocating a new byteswap type number makes the on-disk format
  97   97           * incompatible with any other format that uses the same number.
  98   98           *
  99   99           * Data can usually be structured to work with one of the
 100  100           * DMU_BSWAP_UINT* or DMU_BSWAP_ZAP types.
 101  101           */
  
    | 
      ↓ open down ↓ | 
    66 lines elided | 
    
      ↑ open up ↑ | 
  
 102  102          DMU_BSWAP_NUMFUNCS
 103  103  } dmu_object_byteswap_t;
 104  104  
 105  105  #define DMU_OT_NEWTYPE 0x80
 106  106  #define DMU_OT_METADATA 0x40
 107  107  #define DMU_OT_BYTESWAP_MASK 0x3f
 108  108  
 109  109  /*
 110  110   * Defines a uint8_t object type. Object types specify if the data
 111  111   * in the object is metadata (boolean) and how to byteswap the data
 112      - * (dmu_object_byteswap_t).
      112 + * (dmu_object_byteswap_t). All of the types created by this method
      113 + * are cached in the dbuf metadata cache.
 113  114   */
 114  115  #define DMU_OT(byteswap, metadata) \
 115  116          (DMU_OT_NEWTYPE | \
 116  117          ((metadata) ? DMU_OT_METADATA : 0) | \
 117  118          ((byteswap) & DMU_OT_BYTESWAP_MASK))
 118  119  
 119  120  #define DMU_OT_IS_VALID(ot) (((ot) & DMU_OT_NEWTYPE) ? \
 120  121          ((ot) & DMU_OT_BYTESWAP_MASK) < DMU_BSWAP_NUMFUNCS : \
 121  122          (ot) < DMU_OT_NUMTYPES)
 122  123  
 123  124  #define DMU_OT_IS_METADATA(ot) (((ot) & DMU_OT_NEWTYPE) ? \
 124  125          ((ot) & DMU_OT_METADATA) : \
 125  126          dmu_ot[(ot)].ot_metadata)
 126  127  
      128 +#define DMU_OT_IS_METADATA_CACHED(ot) (((ot) & DMU_OT_NEWTYPE) ? \
      129 +        B_TRUE : dmu_ot[(ot)].ot_dbuf_metadata_cache)
      130 +
 127  131  /*
 128  132   * These object types use bp_fill != 1 for their L0 bp's. Therefore they can't
 129  133   * have their data embedded (i.e. use a BP_IS_EMBEDDED() bp), because bp_fill
 130  134   * is repurposed for embedded BPs.
 131  135   */
 132  136  #define DMU_OT_HAS_FILL(ot) \
 133  137          ((ot) == DMU_OT_DNODE || (ot) == DMU_OT_OBJSET)
 134  138  
 135  139  #define DMU_OT_BYTESWAP(ot) (((ot) & DMU_OT_NEWTYPE) ? \
 136  140          ((ot) & DMU_OT_BYTESWAP_MASK) : \
 137  141          dmu_ot[(ot)].ot_byteswap)
 138  142  
 139  143  typedef enum dmu_object_type {
 140  144          DMU_OT_NONE,
 141  145          /* general: */
 142  146          DMU_OT_OBJECT_DIRECTORY,        /* ZAP */
 143  147          DMU_OT_OBJECT_ARRAY,            /* UINT64 */
 144  148          DMU_OT_PACKED_NVLIST,           /* UINT8 (XDR by nvlist_pack/unpack) */
 145  149          DMU_OT_PACKED_NVLIST_SIZE,      /* UINT64 */
 146  150          DMU_OT_BPOBJ,                   /* UINT64 */
 147  151          DMU_OT_BPOBJ_HDR,               /* UINT64 */
 148  152          /* spa: */
 149  153          DMU_OT_SPACE_MAP_HEADER,        /* UINT64 */
 150  154          DMU_OT_SPACE_MAP,               /* UINT64 */
 151  155          /* zil: */
 152  156          DMU_OT_INTENT_LOG,              /* UINT64 */
 153  157          /* dmu: */
 154  158          DMU_OT_DNODE,                   /* DNODE */
 155  159          DMU_OT_OBJSET,                  /* OBJSET */
 156  160          /* dsl: */
 157  161          DMU_OT_DSL_DIR,                 /* UINT64 */
 158  162          DMU_OT_DSL_DIR_CHILD_MAP,       /* ZAP */
 159  163          DMU_OT_DSL_DS_SNAP_MAP,         /* ZAP */
 160  164          DMU_OT_DSL_PROPS,               /* ZAP */
 161  165          DMU_OT_DSL_DATASET,             /* UINT64 */
 162  166          /* zpl: */
 163  167          DMU_OT_ZNODE,                   /* ZNODE */
 164  168          DMU_OT_OLDACL,                  /* Old ACL */
 165  169          DMU_OT_PLAIN_FILE_CONTENTS,     /* UINT8 */
 166  170          DMU_OT_DIRECTORY_CONTENTS,      /* ZAP */
 167  171          DMU_OT_MASTER_NODE,             /* ZAP */
 168  172          DMU_OT_UNLINKED_SET,            /* ZAP */
 169  173          /* zvol: */
 170  174          DMU_OT_ZVOL,                    /* UINT8 */
 171  175          DMU_OT_ZVOL_PROP,               /* ZAP */
 172  176          /* other; for testing only! */
 173  177          DMU_OT_PLAIN_OTHER,             /* UINT8 */
 174  178          DMU_OT_UINT64_OTHER,            /* UINT64 */
 175  179          DMU_OT_ZAP_OTHER,               /* ZAP */
 176  180          /* new object types: */
 177  181          DMU_OT_ERROR_LOG,               /* ZAP */
 178  182          DMU_OT_SPA_HISTORY,             /* UINT8 */
 179  183          DMU_OT_SPA_HISTORY_OFFSETS,     /* spa_his_phys_t */
 180  184          DMU_OT_POOL_PROPS,              /* ZAP */
 181  185          DMU_OT_DSL_PERMS,               /* ZAP */
 182  186          DMU_OT_ACL,                     /* ACL */
 183  187          DMU_OT_SYSACL,                  /* SYSACL */
 184  188          DMU_OT_FUID,                    /* FUID table (Packed NVLIST UINT8) */
 185  189          DMU_OT_FUID_SIZE,               /* FUID table size UINT64 */
 186  190          DMU_OT_NEXT_CLONES,             /* ZAP */
 187  191          DMU_OT_SCAN_QUEUE,              /* ZAP */
 188  192          DMU_OT_USERGROUP_USED,          /* ZAP */
 189  193          DMU_OT_USERGROUP_QUOTA,         /* ZAP */
 190  194          DMU_OT_USERREFS,                /* ZAP */
 191  195          DMU_OT_DDT_ZAP,                 /* ZAP */
 192  196          DMU_OT_DDT_STATS,               /* ZAP */
 193  197          DMU_OT_SA,                      /* System attr */
 194  198          DMU_OT_SA_MASTER_NODE,          /* ZAP */
 195  199          DMU_OT_SA_ATTR_REGISTRATION,    /* ZAP */
 196  200          DMU_OT_SA_ATTR_LAYOUTS,         /* ZAP */
 197  201          DMU_OT_SCAN_XLATE,              /* ZAP */
 198  202          DMU_OT_DEDUP,                   /* fake dedup BP from ddt_bp_create() */
 199  203          DMU_OT_DEADLIST,                /* ZAP */
 200  204          DMU_OT_DEADLIST_HDR,            /* UINT64 */
 201  205          DMU_OT_DSL_CLONES,              /* ZAP */
 202  206          DMU_OT_BPOBJ_SUBOBJ,            /* UINT64 */
 203  207          /*
 204  208           * Do not allocate new object types here. Doing so makes the on-disk
 205  209           * format incompatible with any other format that uses the same object
 206  210           * type number.
 207  211           *
 208  212           * When creating an object which does not have one of the above types
 209  213           * use the DMU_OTN_* type with the correct byteswap and metadata
 210  214           * values.
 211  215           *
 212  216           * The DMU_OTN_* types do not have entries in the dmu_ot table,
 213  217           * use the DMU_OT_IS_METDATA() and DMU_OT_BYTESWAP() macros instead
 214  218           * of indexing into dmu_ot directly (this works for both DMU_OT_* types
 215  219           * and DMU_OTN_* types).
 216  220           */
 217  221          DMU_OT_NUMTYPES,
 218  222  
 219  223          /*
 220  224           * Names for valid types declared with DMU_OT().
 221  225           */
 222  226          DMU_OTN_UINT8_DATA = DMU_OT(DMU_BSWAP_UINT8, B_FALSE),
 223  227          DMU_OTN_UINT8_METADATA = DMU_OT(DMU_BSWAP_UINT8, B_TRUE),
  
    | 
      ↓ open down ↓ | 
    87 lines elided | 
    
      ↑ open up ↑ | 
  
 224  228          DMU_OTN_UINT16_DATA = DMU_OT(DMU_BSWAP_UINT16, B_FALSE),
 225  229          DMU_OTN_UINT16_METADATA = DMU_OT(DMU_BSWAP_UINT16, B_TRUE),
 226  230          DMU_OTN_UINT32_DATA = DMU_OT(DMU_BSWAP_UINT32, B_FALSE),
 227  231          DMU_OTN_UINT32_METADATA = DMU_OT(DMU_BSWAP_UINT32, B_TRUE),
 228  232          DMU_OTN_UINT64_DATA = DMU_OT(DMU_BSWAP_UINT64, B_FALSE),
 229  233          DMU_OTN_UINT64_METADATA = DMU_OT(DMU_BSWAP_UINT64, B_TRUE),
 230  234          DMU_OTN_ZAP_DATA = DMU_OT(DMU_BSWAP_ZAP, B_FALSE),
 231  235          DMU_OTN_ZAP_METADATA = DMU_OT(DMU_BSWAP_ZAP, B_TRUE),
 232  236  } dmu_object_type_t;
 233  237  
      238 +typedef enum txg_how {
      239 +        TXG_WAIT = 1,
      240 +        TXG_NOWAIT,
      241 +        TXG_WAITED,
      242 +} txg_how_t;
      243 +
 234  244  /*
 235      - * These flags are intended to be used to specify the "txg_how"
 236      - * parameter when calling the dmu_tx_assign() function. See the comment
 237      - * above dmu_tx_assign() for more details on the meaning of these flags.
      245 + * Selected classes of metadata
 238  246   */
 239      -#define TXG_NOWAIT      (0ULL)
 240      -#define TXG_WAIT        (1ULL<<0)
 241      -#define TXG_NOTHROTTLE  (1ULL<<1)
      247 +#define DMU_OT_IS_DDT_META(type)        \
      248 +        ((type == DMU_OT_DDT_ZAP) ||    \
      249 +        (type == DMU_OT_DDT_STATS))
 242  250  
      251 +#define DMU_OT_IS_ZPL_META(type)                \
      252 +        ((type == DMU_OT_ZNODE) ||              \
      253 +        (type == DMU_OT_OLDACL) ||              \
      254 +        (type == DMU_OT_DIRECTORY_CONTENTS) ||  \
      255 +        (type == DMU_OT_MASTER_NODE) ||         \
      256 +        (type == DMU_OT_UNLINKED_SET))
      257 +
 243  258  void byteswap_uint64_array(void *buf, size_t size);
 244  259  void byteswap_uint32_array(void *buf, size_t size);
 245  260  void byteswap_uint16_array(void *buf, size_t size);
 246  261  void byteswap_uint8_array(void *buf, size_t size);
 247  262  void zap_byteswap(void *buf, size_t size);
 248  263  void zfs_oldacl_byteswap(void *buf, size_t size);
 249  264  void zfs_acl_byteswap(void *buf, size_t size);
 250  265  void zfs_znode_byteswap(void *buf, size_t size);
 251  266  
 252  267  #define DS_FIND_SNAPSHOTS       (1<<0)
 253  268  #define DS_FIND_CHILDREN        (1<<1)
 254  269  #define DS_FIND_SERIALIZE       (1<<2)
 255  270  
 256  271  /*
 257  272   * The maximum number of bytes that can be accessed as part of one
 258  273   * operation, including metadata.
 259  274   */
 260  275  #define DMU_MAX_ACCESS (32 * 1024 * 1024) /* 32MB */
 261  276  #define DMU_MAX_DELETEBLKCNT (20480) /* ~5MB of indirect blocks */
 262  277  
 263  278  #define DMU_USERUSED_OBJECT     (-1ULL)
 264  279  #define DMU_GROUPUSED_OBJECT    (-2ULL)
 265  280  
 266  281  /*
 267  282   * artificial blkids for bonus buffer and spill blocks
 268  283   */
 269  284  #define DMU_BONUS_BLKID         (-1ULL)
 270  285  #define DMU_SPILL_BLKID         (-2ULL)
 271  286  /*
 272  287   * Public routines to create, destroy, open, and close objsets.
 273  288   */
 274  289  int dmu_objset_hold(const char *name, void *tag, objset_t **osp);
 275  290  int dmu_objset_own(const char *name, dmu_objset_type_t type,
 276  291      boolean_t readonly, void *tag, objset_t **osp);
 277  292  void dmu_objset_rele(objset_t *os, void *tag);
 278  293  void dmu_objset_disown(objset_t *os, void *tag);
 279  294  int dmu_objset_open_ds(struct dsl_dataset *ds, objset_t **osp);
 280  295  
 281  296  void dmu_objset_evict_dbufs(objset_t *os);
 282  297  int dmu_objset_create(const char *name, dmu_objset_type_t type, uint64_t flags,
 283  298      void (*func)(objset_t *os, void *arg, cred_t *cr, dmu_tx_t *tx), void *arg);
  
    | 
      ↓ open down ↓ | 
    31 lines elided | 
    
      ↑ open up ↑ | 
  
 284  299  int dmu_objset_clone(const char *name, const char *origin);
 285  300  int dsl_destroy_snapshots_nvl(struct nvlist *snaps, boolean_t defer,
 286  301      struct nvlist *errlist);
 287  302  int dmu_objset_snapshot_one(const char *fsname, const char *snapname);
 288  303  int dmu_objset_snapshot_tmp(const char *, const char *, int);
 289  304  int dmu_objset_find(char *name, int func(const char *, void *), void *arg,
 290  305      int flags);
 291  306  void dmu_objset_byteswap(void *buf, size_t size);
 292  307  int dsl_dataset_rename_snapshot(const char *fsname,
 293  308      const char *oldsnapname, const char *newsnapname, boolean_t recursive);
 294      -int dmu_objset_remap_indirects(const char *fsname);
 295  309  
 296  310  typedef struct dmu_buf {
 297  311          uint64_t db_object;             /* object that this buffer is part of */
 298  312          uint64_t db_offset;             /* byte offset in this object */
 299  313          uint64_t db_size;               /* size of buffer in bytes */
 300  314          void *db_data;                  /* data in buffer */
 301  315  } dmu_buf_t;
 302  316  
 303  317  /*
 304  318   * The names of zap entries in the DIRECTORY_OBJECT of the MOS.
 305  319   */
 306  320  #define DMU_POOL_DIRECTORY_OBJECT       1
 307  321  #define DMU_POOL_CONFIG                 "config"
 308  322  #define DMU_POOL_FEATURES_FOR_WRITE     "features_for_write"
 309  323  #define DMU_POOL_FEATURES_FOR_READ      "features_for_read"
 310  324  #define DMU_POOL_FEATURE_DESCRIPTIONS   "feature_descriptions"
 311  325  #define DMU_POOL_FEATURE_ENABLED_TXG    "feature_enabled_txg"
 312  326  #define DMU_POOL_ROOT_DATASET           "root_dataset"
 313  327  #define DMU_POOL_SYNC_BPOBJ             "sync_bplist"
 314  328  #define DMU_POOL_ERRLOG_SCRUB           "errlog_scrub"
 315  329  #define DMU_POOL_ERRLOG_LAST            "errlog_last"
 316  330  #define DMU_POOL_SPARES                 "spares"
 317  331  #define DMU_POOL_DEFLATE                "deflate"
 318  332  #define DMU_POOL_HISTORY                "history"
 319  333  #define DMU_POOL_PROPS                  "pool_props"
 320  334  #define DMU_POOL_L2CACHE                "l2cache"
  
    | 
      ↓ open down ↓ | 
    16 lines elided | 
    
      ↑ open up ↑ | 
  
 321  335  #define DMU_POOL_TMP_USERREFS           "tmp_userrefs"
 322  336  #define DMU_POOL_DDT                    "DDT-%s-%s-%s"
 323  337  #define DMU_POOL_DDT_STATS              "DDT-statistics"
 324  338  #define DMU_POOL_CREATION_VERSION       "creation_version"
 325  339  #define DMU_POOL_SCAN                   "scan"
 326  340  #define DMU_POOL_FREE_BPOBJ             "free_bpobj"
 327  341  #define DMU_POOL_BPTREE_OBJ             "bptree_obj"
 328  342  #define DMU_POOL_EMPTY_BPOBJ            "empty_bpobj"
 329  343  #define DMU_POOL_CHECKSUM_SALT          "org.illumos:checksum_salt"
 330  344  #define DMU_POOL_VDEV_ZAP_MAP           "com.delphix:vdev_zap_map"
 331      -#define DMU_POOL_REMOVING               "com.delphix:removing"
 332      -#define DMU_POOL_OBSOLETE_BPOBJ         "com.delphix:obsolete_bpobj"
 333      -#define DMU_POOL_CONDENSING_INDIRECT    "com.delphix:condensing_indirect"
 334  345  
      346 +#define DMU_POOL_COS_PROPS              "cos_props"
      347 +#define DMU_POOL_VDEV_PROPS             "vdev_props"
      348 +#define DMU_POOL_TRIM_START_TIME        "trim_start_time"
      349 +#define DMU_POOL_TRIM_STOP_TIME         "trim_stop_time"
      350 +
 335  351  /*
 336  352   * Allocate an object from this objset.  The range of object numbers
 337  353   * available is (0, DN_MAX_OBJECT).  Object 0 is the meta-dnode.
 338  354   *
 339  355   * The transaction must be assigned to a txg.  The newly allocated
 340  356   * object will be "held" in the transaction (ie. you can modify the
 341  357   * newly allocated object in this transaction).
 342  358   *
 343  359   * dmu_object_alloc() chooses an object and returns it in *objectp.
 344  360   *
 345  361   * dmu_object_claim() allocates a specific object number.  If that
 346  362   * number is already allocated, it fails and returns EEXIST.
 347  363   *
 348  364   * Return 0 on success, or ENOSPC or EEXIST as specified above.
 349  365   */
 350  366  uint64_t dmu_object_alloc(objset_t *os, dmu_object_type_t ot,
 351  367      int blocksize, dmu_object_type_t bonus_type, int bonus_len, dmu_tx_t *tx);
 352  368  int dmu_object_claim(objset_t *os, uint64_t object, dmu_object_type_t ot,
 353  369      int blocksize, dmu_object_type_t bonus_type, int bonus_len, dmu_tx_t *tx);
 354  370  int dmu_object_reclaim(objset_t *os, uint64_t object, dmu_object_type_t ot,
 355  371      int blocksize, dmu_object_type_t bonustype, int bonuslen, dmu_tx_t *txp);
 356  372  
 357  373  /*
 358  374   * Free an object from this objset.
 359  375   *
 360  376   * The object's data will be freed as well (ie. you don't need to call
 361  377   * dmu_free(object, 0, -1, tx)).
 362  378   *
 363  379   * The object need not be held in the transaction.
 364  380   *
 365  381   * If there are any holds on this object's buffers (via dmu_buf_hold()),
 366  382   * or tx holds on the object (via dmu_tx_hold_object()), you can not
 367  383   * free it; it fails and returns EBUSY.
 368  384   *
 369  385   * If the object is not allocated, it fails and returns ENOENT.
 370  386   *
 371  387   * Return 0 on success, or EBUSY or ENOENT as specified above.
 372  388   */
 373  389  int dmu_object_free(objset_t *os, uint64_t object, dmu_tx_t *tx);
 374  390  
 375  391  /*
 376  392   * Find the next allocated or free object.
 377  393   *
 378  394   * The objectp parameter is in-out.  It will be updated to be the next
 379  395   * object which is allocated.  Ignore objects which have not been
 380  396   * modified since txg.
 381  397   *
 382  398   * XXX Can only be called on a objset with no dirty data.
 383  399   *
 384  400   * Returns 0 on success, or ENOENT if there are no more objects.
 385  401   */
 386  402  int dmu_object_next(objset_t *os, uint64_t *objectp,
 387  403      boolean_t hole, uint64_t txg);
 388  404  
 389  405  /*
 390  406   * Set the data blocksize for an object.
 391  407   *
 392  408   * The object cannot have any blocks allcated beyond the first.  If
 393  409   * the first block is allocated already, the new size must be greater
 394  410   * than the current block size.  If these conditions are not met,
 395  411   * ENOTSUP will be returned.
 396  412   *
 397  413   * Returns 0 on success, or EBUSY if there are any holds on the object
 398  414   * contents, or ENOTSUP as described above.
 399  415   */
 400  416  int dmu_object_set_blocksize(objset_t *os, uint64_t object, uint64_t size,
 401  417      int ibs, dmu_tx_t *tx);
 402  418  
 403  419  /*
 404  420   * Set the checksum property on a dnode.  The new checksum algorithm will
 405  421   * apply to all newly written blocks; existing blocks will not be affected.
 406  422   */
  
    | 
      ↓ open down ↓ | 
    62 lines elided | 
    
      ↑ open up ↑ | 
  
 407  423  void dmu_object_set_checksum(objset_t *os, uint64_t object, uint8_t checksum,
 408  424      dmu_tx_t *tx);
 409  425  
 410  426  /*
 411  427   * Set the compress property on a dnode.  The new compression algorithm will
 412  428   * apply to all newly written blocks; existing blocks will not be affected.
 413  429   */
 414  430  void dmu_object_set_compress(objset_t *os, uint64_t object, uint8_t compress,
 415  431      dmu_tx_t *tx);
 416  432  
 417      -int dmu_object_remap_indirects(objset_t *os, uint64_t object, uint64_t txg);
 418      -
 419  433  void
 420  434  dmu_write_embedded(objset_t *os, uint64_t object, uint64_t offset,
 421  435      void *data, uint8_t etype, uint8_t comp, int uncompressed_size,
 422  436      int compressed_size, int byteorder, dmu_tx_t *tx);
 423  437  
 424  438  /*
 425  439   * Decide how to write a block: checksum, compression, number of copies, etc.
 426  440   */
 427  441  #define WP_NOFILL       0x1
 428  442  #define WP_DMU_SYNC     0x2
 429  443  #define WP_SPILL        0x4
 430  444  
      445 +#define WP_SPECIALCLASS_SHIFT   (16)
      446 +#define WP_SPECIALCLASS_BITS    (1) /* 1 bits per storage class */
      447 +#define WP_SPECIALCLASS_MASK    (((1 << WP_SPECIALCLASS_BITS) - 1) \
      448 +        << WP_SPECIALCLASS_SHIFT)
      449 +
      450 +#define WP_SET_SPECIALCLASS(flags, sclass)      { \
      451 +        flags |= ((sclass << WP_SPECIALCLASS_SHIFT) & WP_SPECIALCLASS_MASK); \
      452 +}
      453 +
      454 +#define WP_GET_SPECIALCLASS(flags) \
      455 +        ((flags & WP_SPECIALCLASS_MASK) >> WP_SPECIALCLASS_SHIFT)
      456 +
 431  457  void dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp,
 432  458      struct zio_prop *zp);
 433  459  /*
 434  460   * The bonus data is accessed more or less like a regular buffer.
 435  461   * You must dmu_bonus_hold() to get the buffer, which will give you a
 436  462   * dmu_buf_t with db_offset==-1ULL, and db_size = the size of the bonus
 437      - * data.  As with any normal buffer, you must call dmu_buf_will_dirty()
 438      - * before modifying it, and the
      463 + * data.  As with any normal buffer, you must call dmu_buf_read() to
      464 + * read db_data, dmu_buf_will_dirty() before modifying it, and the
 439  465   * object must be held in an assigned transaction before calling
 440  466   * dmu_buf_will_dirty.  You may use dmu_buf_set_user() on the bonus
 441  467   * buffer as well.  You must release your hold with dmu_buf_rele().
 442  468   *
 443  469   * Returns ENOENT, EIO, or 0.
 444  470   */
 445  471  int dmu_bonus_hold(objset_t *os, uint64_t object, void *tag, dmu_buf_t **);
 446  472  int dmu_bonus_max(void);
 447  473  int dmu_set_bonus(dmu_buf_t *, int, dmu_tx_t *);
 448  474  int dmu_set_bonustype(dmu_buf_t *, dmu_object_type_t, dmu_tx_t *);
 449  475  dmu_object_type_t dmu_get_bonustype(dmu_buf_t *);
 450  476  int dmu_rm_spill(objset_t *, uint64_t, dmu_tx_t *);
 451  477  
 452  478  /*
 453  479   * Special spill buffer support used by "SA" framework
 454  480   */
 455  481  
 456  482  int dmu_spill_hold_by_bonus(dmu_buf_t *bonus, void *tag, dmu_buf_t **dbp);
 457  483  int dmu_spill_hold_by_dnode(dnode_t *dn, uint32_t flags,
 458  484      void *tag, dmu_buf_t **dbp);
 459  485  int dmu_spill_hold_existing(dmu_buf_t *bonus, void *tag, dmu_buf_t **dbp);
 460  486  
 461  487  /*
 462  488   * Obtain the DMU buffer from the specified object which contains the
 463  489   * specified offset.  dmu_buf_hold() puts a "hold" on the buffer, so
 464  490   * that it will remain in memory.  You must release the hold with
 465  491   * dmu_buf_rele().  You musn't access the dmu_buf_t after releasing your
 466  492   * hold.  You must have a hold on any dmu_buf_t* you pass to the DMU.
 467  493   *
 468  494   * You must call dmu_buf_read, dmu_buf_will_dirty, or dmu_buf_will_fill
 469  495   * on the returned buffer before reading or writing the buffer's
 470  496   * db_data.  The comments for those routines describe what particular
 471  497   * operations are valid after calling them.
 472  498   *
 473  499   * The object number must be a valid, allocated object number.
 474  500   */
 475  501  int dmu_buf_hold(objset_t *os, uint64_t object, uint64_t offset,
 476  502      void *tag, dmu_buf_t **, int flags);
 477  503  int dmu_buf_hold_by_dnode(dnode_t *dn, uint64_t offset,
 478  504      void *tag, dmu_buf_t **dbp, int flags);
 479  505  
 480  506  /*
 481  507   * Add a reference to a dmu buffer that has already been held via
 482  508   * dmu_buf_hold() in the current context.
 483  509   */
 484  510  void dmu_buf_add_ref(dmu_buf_t *db, void* tag);
 485  511  
 486  512  /*
 487  513   * Attempt to add a reference to a dmu buffer that is in an unknown state,
 488  514   * using a pointer that may have been invalidated by eviction processing.
 489  515   * The request will succeed if the passed in dbuf still represents the
 490  516   * same os/object/blkid, is ineligible for eviction, and has at least
 491  517   * one hold by a user other than the syncer.
 492  518   */
 493  519  boolean_t dmu_buf_try_add_ref(dmu_buf_t *, objset_t *os, uint64_t object,
 494  520      uint64_t blkid, void *tag);
 495  521  
 496  522  void dmu_buf_rele(dmu_buf_t *db, void *tag);
 497  523  uint64_t dmu_buf_refcount(dmu_buf_t *db);
 498  524  
 499  525  /*
 500  526   * dmu_buf_hold_array holds the DMU buffers which contain all bytes in a
 501  527   * range of an object.  A pointer to an array of dmu_buf_t*'s is
 502  528   * returned (in *dbpp).
 503  529   *
 504  530   * dmu_buf_rele_array releases the hold on an array of dmu_buf_t*'s, and
 505  531   * frees the array.  The hold on the array of buffers MUST be released
 506  532   * with dmu_buf_rele_array.  You can NOT release the hold on each buffer
 507  533   * individually with dmu_buf_rele.
 508  534   */
 509  535  int dmu_buf_hold_array_by_bonus(dmu_buf_t *db, uint64_t offset,
 510  536      uint64_t length, boolean_t read, void *tag,
 511  537      int *numbufsp, dmu_buf_t ***dbpp);
 512  538  void dmu_buf_rele_array(dmu_buf_t **, int numbufs, void *tag);
 513  539  
 514  540  typedef void dmu_buf_evict_func_t(void *user_ptr);
 515  541  
 516  542  /*
 517  543   * A DMU buffer user object may be associated with a dbuf for the
 518  544   * duration of its lifetime.  This allows the user of a dbuf (client)
 519  545   * to attach private data to a dbuf (e.g. in-core only data such as a
 520  546   * dnode_children_t, zap_t, or zap_leaf_t) and be optionally notified
 521  547   * when that dbuf has been evicted.  Clients typically respond to the
 522  548   * eviction notification by freeing their private data, thus ensuring
 523  549   * the same lifetime for both dbuf and private data.
 524  550   *
 525  551   * The mapping from a dmu_buf_user_t to any client private data is the
 526  552   * client's responsibility.  All current consumers of the API with private
 527  553   * data embed a dmu_buf_user_t as the first member of the structure for
 528  554   * their private data.  This allows conversions between the two types
 529  555   * with a simple cast.  Since the DMU buf user API never needs access
 530  556   * to the private data, other strategies can be employed if necessary
 531  557   * or convenient for the client (e.g. using container_of() to do the
 532  558   * conversion for private data that cannot have the dmu_buf_user_t as
 533  559   * its first member).
 534  560   *
 535  561   * Eviction callbacks are executed without the dbuf mutex held or any
 536  562   * other type of mechanism to guarantee that the dbuf is still available.
 537  563   * For this reason, users must assume the dbuf has already been freed
 538  564   * and not reference the dbuf from the callback context.
 539  565   *
 540  566   * Users requesting "immediate eviction" are notified as soon as the dbuf
 541  567   * is only referenced by dirty records (dirties == holds).  Otherwise the
 542  568   * notification occurs after eviction processing for the dbuf begins.
 543  569   */
 544  570  typedef struct dmu_buf_user {
 545  571          /*
 546  572           * Asynchronous user eviction callback state.
 547  573           */
 548  574          taskq_ent_t     dbu_tqent;
 549  575  
 550  576          /*
 551  577           * This instance's eviction function pointers.
 552  578           *
 553  579           * dbu_evict_func_sync is called synchronously and then
 554  580           * dbu_evict_func_async is executed asynchronously on a taskq.
 555  581           */
 556  582          dmu_buf_evict_func_t *dbu_evict_func_sync;
 557  583          dmu_buf_evict_func_t *dbu_evict_func_async;
 558  584  #ifdef ZFS_DEBUG
 559  585          /*
 560  586           * Pointer to user's dbuf pointer.  NULL for clients that do
 561  587           * not associate a dbuf with their user data.
 562  588           *
 563  589           * The dbuf pointer is cleared upon eviction so as to catch
 564  590           * use-after-evict bugs in clients.
 565  591           */
 566  592          dmu_buf_t **dbu_clear_on_evict_dbufp;
 567  593  #endif
 568  594  } dmu_buf_user_t;
 569  595  
 570  596  /*
 571  597   * Initialize the given dmu_buf_user_t instance with the eviction function
 572  598   * evict_func, to be called when the user is evicted.
 573  599   *
 574  600   * NOTE: This function should only be called once on a given dmu_buf_user_t.
 575  601   *       To allow enforcement of this, dbu must already be zeroed on entry.
 576  602   */
 577  603  /*ARGSUSED*/
 578  604  inline void
 579  605  dmu_buf_init_user(dmu_buf_user_t *dbu, dmu_buf_evict_func_t *evict_func_sync,
 580  606      dmu_buf_evict_func_t *evict_func_async, dmu_buf_t **clear_on_evict_dbufp)
 581  607  {
 582  608          ASSERT(dbu->dbu_evict_func_sync == NULL);
 583  609          ASSERT(dbu->dbu_evict_func_async == NULL);
 584  610  
 585  611          /* must have at least one evict func */
 586  612          IMPLY(evict_func_sync == NULL, evict_func_async != NULL);
 587  613          dbu->dbu_evict_func_sync = evict_func_sync;
 588  614          dbu->dbu_evict_func_async = evict_func_async;
 589  615  #ifdef ZFS_DEBUG
 590  616          dbu->dbu_clear_on_evict_dbufp = clear_on_evict_dbufp;
 591  617  #endif
 592  618  }
 593  619  
 594  620  /*
 595  621   * Attach user data to a dbuf and mark it for normal (when the dbuf's
 596  622   * data is cleared or its reference count goes to zero) eviction processing.
 597  623   *
 598  624   * Returns NULL on success, or the existing user if another user currently
 599  625   * owns the buffer.
 600  626   */
 601  627  void *dmu_buf_set_user(dmu_buf_t *db, dmu_buf_user_t *user);
 602  628  
 603  629  /*
 604  630   * Attach user data to a dbuf and mark it for immediate (its dirty and
 605  631   * reference counts are equal) eviction processing.
 606  632   *
 607  633   * Returns NULL on success, or the existing user if another user currently
 608  634   * owns the buffer.
 609  635   */
 610  636  void *dmu_buf_set_user_ie(dmu_buf_t *db, dmu_buf_user_t *user);
 611  637  
 612  638  /*
 613  639   * Replace the current user of a dbuf.
 614  640   *
 615  641   * If given the current user of a dbuf, replaces the dbuf's user with
 616  642   * "new_user" and returns the user data pointer that was replaced.
 617  643   * Otherwise returns the current, and unmodified, dbuf user pointer.
 618  644   */
 619  645  void *dmu_buf_replace_user(dmu_buf_t *db,
 620  646      dmu_buf_user_t *old_user, dmu_buf_user_t *new_user);
 621  647  
 622  648  /*
 623  649   * Remove the specified user data for a DMU buffer.
 624  650   *
 625  651   * Returns the user that was removed on success, or the current user if
 626  652   * another user currently owns the buffer.
 627  653   */
 628  654  void *dmu_buf_remove_user(dmu_buf_t *db, dmu_buf_user_t *user);
 629  655  
 630  656  /*
 631  657   * Returns the user data (dmu_buf_user_t *) associated with this dbuf.
 632  658   */
 633  659  void *dmu_buf_get_user(dmu_buf_t *db);
 634  660  
 635  661  objset_t *dmu_buf_get_objset(dmu_buf_t *db);
 636  662  dnode_t *dmu_buf_dnode_enter(dmu_buf_t *db);
 637  663  void dmu_buf_dnode_exit(dmu_buf_t *db);
 638  664  
 639  665  /* Block until any in-progress dmu buf user evictions complete. */
 640  666  void dmu_buf_user_evict_wait(void);
 641  667  
 642  668  /*
 643  669   * Returns the blkptr associated with this dbuf, or NULL if not set.
 644  670   */
  
    | 
      ↓ open down ↓ | 
    196 lines elided | 
    
      ↑ open up ↑ | 
  
 645  671  struct blkptr *dmu_buf_get_blkptr(dmu_buf_t *db);
 646  672  
 647  673  /*
 648  674   * Indicate that you are going to modify the buffer's data (db_data).
 649  675   *
 650  676   * The transaction (tx) must be assigned to a txg (ie. you've called
 651  677   * dmu_tx_assign()).  The buffer's object must be held in the tx
 652  678   * (ie. you've called dmu_tx_hold_object(tx, db->db_object)).
 653  679   */
 654  680  void dmu_buf_will_dirty(dmu_buf_t *db, dmu_tx_t *tx);
      681 +void dmu_buf_will_dirty_sc(dmu_buf_t *db, dmu_tx_t *tx, boolean_t sc);
 655  682  
 656  683  /*
 657  684   * You must create a transaction, then hold the objects which you will
 658  685   * (or might) modify as part of this transaction.  Then you must assign
 659  686   * the transaction to a transaction group.  Once the transaction has
 660  687   * been assigned, you can modify buffers which belong to held objects as
 661  688   * part of this transaction.  You can't modify buffers before the
 662  689   * transaction has been assigned; you can't modify buffers which don't
 663  690   * belong to objects which this transaction holds; you can't hold
 664  691   * objects once the transaction has been assigned.  You may hold an
 665  692   * object which you are going to free (with dmu_object_free()), but you
 666  693   * don't have to.
 667  694   *
 668  695   * You can abort the transaction before it has been assigned.
 669  696   *
 670  697   * Note that you may hold buffers (with dmu_buf_hold) at any time,
 671  698   * regardless of transaction state.
 672  699   */
 673  700  
 674  701  #define DMU_NEW_OBJECT  (-1ULL)
  
    | 
      ↓ open down ↓ | 
    10 lines elided | 
    
      ↑ open up ↑ | 
  
 675  702  #define DMU_OBJECT_END  (-1ULL)
 676  703  
 677  704  dmu_tx_t *dmu_tx_create(objset_t *os);
 678  705  void dmu_tx_hold_write(dmu_tx_t *tx, uint64_t object, uint64_t off, int len);
 679  706  void dmu_tx_hold_write_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
 680  707      int len);
 681  708  void dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object, uint64_t off,
 682  709      uint64_t len);
 683  710  void dmu_tx_hold_free_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
 684  711      uint64_t len);
 685      -void dmu_tx_hold_remap_l1indirect(dmu_tx_t *tx, uint64_t object);
 686  712  void dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, const char *name);
 687  713  void dmu_tx_hold_zap_by_dnode(dmu_tx_t *tx, dnode_t *dn, int add,
 688  714      const char *name);
 689  715  void dmu_tx_hold_bonus(dmu_tx_t *tx, uint64_t object);
 690  716  void dmu_tx_hold_bonus_by_dnode(dmu_tx_t *tx, dnode_t *dn);
 691  717  void dmu_tx_hold_spill(dmu_tx_t *tx, uint64_t object);
 692  718  void dmu_tx_hold_sa(dmu_tx_t *tx, struct sa_handle *hdl, boolean_t may_grow);
 693  719  void dmu_tx_hold_sa_create(dmu_tx_t *tx, int total_size);
 694  720  void dmu_tx_abort(dmu_tx_t *tx);
 695      -int dmu_tx_assign(dmu_tx_t *tx, uint64_t txg_how);
      721 +int dmu_tx_assign(dmu_tx_t *tx, enum txg_how txg_how);
 696  722  void dmu_tx_wait(dmu_tx_t *tx);
 697  723  void dmu_tx_commit(dmu_tx_t *tx);
 698  724  void dmu_tx_mark_netfree(dmu_tx_t *tx);
 699  725  
 700  726  /*
 701  727   * To register a commit callback, dmu_tx_callback_register() must be called.
 702  728   *
 703  729   * dcb_data is a pointer to caller private data that is passed on as a
 704  730   * callback parameter. The caller is responsible for properly allocating and
 705  731   * freeing it.
 706  732   *
 707  733   * When registering a callback, the transaction must be already created, but
 708  734   * it cannot be committed or aborted. It can be assigned to a txg or not.
 709  735   *
 710  736   * The callback will be called after the transaction has been safely written
 711  737   * to stable storage and will also be called if the dmu_tx is aborted.
 712  738   * If there is any error which prevents the transaction from being committed to
 713  739   * disk, the callback will be called with a value of error != 0.
 714  740   */
 715  741  typedef void dmu_tx_callback_func_t(void *dcb_data, int error);
 716  742  
 717  743  void dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *dcb_func,
 718  744      void *dcb_data);
 719  745  
 720  746  /*
 721  747   * Free up the data blocks for a defined range of a file.  If size is
 722  748   * -1, the range from offset to end-of-file is freed.
 723  749   */
 724  750  int dmu_free_range(objset_t *os, uint64_t object, uint64_t offset,
 725  751          uint64_t size, dmu_tx_t *tx);
 726  752  int dmu_free_long_range(objset_t *os, uint64_t object, uint64_t offset,
 727  753          uint64_t size);
 728  754  int dmu_free_long_object(objset_t *os, uint64_t object);
 729  755  
 730  756  /*
 731  757   * Convenience functions.
 732  758   *
 733  759   * Canfail routines will return 0 on success, or an errno if there is a
 734  760   * nonrecoverable I/O error.
 735  761   */
 736  762  #define DMU_READ_PREFETCH       0 /* prefetch */
 737  763  #define DMU_READ_NO_PREFETCH    1 /* don't prefetch */
 738  764  int dmu_read(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
 739  765          void *buf, uint32_t flags);
 740  766  int dmu_read_by_dnode(dnode_t *dn, uint64_t offset, uint64_t size, void *buf,
 741  767      uint32_t flags);
 742  768  void dmu_write(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
 743  769          const void *buf, dmu_tx_t *tx);
 744  770  void dmu_write_by_dnode(dnode_t *dn, uint64_t offset, uint64_t size,
 745  771      const void *buf, dmu_tx_t *tx);
 746  772  void dmu_prealloc(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
 747  773          dmu_tx_t *tx);
 748  774  int dmu_read_uio(objset_t *os, uint64_t object, struct uio *uio, uint64_t size);
 749  775  int dmu_read_uio_dbuf(dmu_buf_t *zdb, struct uio *uio, uint64_t size);
 750  776  int dmu_write_uio(objset_t *os, uint64_t object, struct uio *uio, uint64_t size,
 751  777      dmu_tx_t *tx);
 752  778  int dmu_write_uio_dbuf(dmu_buf_t *zdb, struct uio *uio, uint64_t size,
 753  779      dmu_tx_t *tx);
 754  780  int dmu_write_pages(objset_t *os, uint64_t object, uint64_t offset,
 755  781      uint64_t size, struct page *pp, dmu_tx_t *tx);
 756  782  struct arc_buf *dmu_request_arcbuf(dmu_buf_t *handle, int size);
 757  783  void dmu_return_arcbuf(struct arc_buf *buf);
 758  784  void dmu_assign_arcbuf(dmu_buf_t *handle, uint64_t offset, struct arc_buf *buf,
 759  785      dmu_tx_t *tx);
 760  786  int dmu_xuio_init(struct xuio *uio, int niov);
 761  787  void dmu_xuio_fini(struct xuio *uio);
 762  788  int dmu_xuio_add(struct xuio *uio, struct arc_buf *abuf, offset_t off,
 763  789      size_t n);
 764  790  int dmu_xuio_cnt(struct xuio *uio);
 765  791  struct arc_buf *dmu_xuio_arcbuf(struct xuio *uio, int i);
 766  792  void dmu_xuio_clear(struct xuio *uio, int i);
 767  793  void xuio_stat_wbuf_copied(void);
 768  794  void xuio_stat_wbuf_nocopy(void);
 769  795  
 770  796  extern boolean_t zfs_prefetch_disable;
 771  797  extern int zfs_max_recordsize;
 772  798  
 773  799  /*
 774  800   * Asynchronously try to read in the data.
 775  801   */
 776  802  void dmu_prefetch(objset_t *os, uint64_t object, int64_t level, uint64_t offset,
 777  803      uint64_t len, enum zio_priority pri);
 778  804  
 779  805  typedef struct dmu_object_info {
 780  806          /* All sizes are in bytes unless otherwise indicated. */
 781  807          uint32_t doi_data_block_size;
 782  808          uint32_t doi_metadata_block_size;
 783  809          dmu_object_type_t doi_type;
 784  810          dmu_object_type_t doi_bonus_type;
 785  811          uint64_t doi_bonus_size;
 786  812          uint8_t doi_indirection;                /* 2 = dnode->indirect->data */
 787  813          uint8_t doi_checksum;
 788  814          uint8_t doi_compress;
 789  815          uint8_t doi_nblkptr;
 790  816          uint8_t doi_pad[4];
  
    | 
      ↓ open down ↓ | 
    85 lines elided | 
    
      ↑ open up ↑ | 
  
 791  817          uint64_t doi_physical_blocks_512;       /* data + metadata, 512b blks */
 792  818          uint64_t doi_max_offset;
 793  819          uint64_t doi_fill_count;                /* number of non-empty blocks */
 794  820  } dmu_object_info_t;
 795  821  
 796  822  typedef void arc_byteswap_func_t(void *buf, size_t size);
 797  823  
 798  824  typedef struct dmu_object_type_info {
 799  825          dmu_object_byteswap_t   ot_byteswap;
 800  826          boolean_t               ot_metadata;
      827 +        boolean_t               ot_dbuf_metadata_cache;
 801  828          char                    *ot_name;
 802  829  } dmu_object_type_info_t;
 803  830  
 804  831  typedef struct dmu_object_byteswap_info {
 805  832          arc_byteswap_func_t     *ob_func;
 806  833          char                    *ob_name;
 807  834  } dmu_object_byteswap_info_t;
 808  835  
 809  836  extern const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES];
 810  837  extern const dmu_object_byteswap_info_t dmu_ot_byteswap[DMU_BSWAP_NUMFUNCS];
 811  838  
 812  839  /*
 813  840   * Get information on a DMU object.
 814  841   *
 815  842   * Return 0 on success or ENOENT if object is not allocated.
 816  843   *
 817  844   * If doi is NULL, just indicates whether the object exists.
 818  845   */
 819  846  int dmu_object_info(objset_t *os, uint64_t object, dmu_object_info_t *doi);
 820  847  /* Like dmu_object_info, but faster if you have a held dnode in hand. */
 821  848  void dmu_object_info_from_dnode(dnode_t *dn, dmu_object_info_t *doi);
 822  849  /* Like dmu_object_info, but faster if you have a held dbuf in hand. */
 823  850  void dmu_object_info_from_db(dmu_buf_t *db, dmu_object_info_t *doi);
 824  851  /*
 825  852   * Like dmu_object_info_from_db, but faster still when you only care about
 826  853   * the size.  This is specifically optimized for zfs_getattr().
  
    | 
      ↓ open down ↓ | 
    16 lines elided | 
    
      ↑ open up ↑ | 
  
 827  854   */
 828  855  void dmu_object_size_from_db(dmu_buf_t *db, uint32_t *blksize,
 829  856      u_longlong_t *nblk512);
 830  857  
 831  858  typedef struct dmu_objset_stats {
 832  859          uint64_t dds_num_clones; /* number of clones of this */
 833  860          uint64_t dds_creation_txg;
 834  861          uint64_t dds_guid;
 835  862          dmu_objset_type_t dds_type;
 836  863          uint8_t dds_is_snapshot;
      864 +        uint8_t dds_is_autosnapshot;
 837  865          uint8_t dds_inconsistent;
 838  866          char dds_origin[ZFS_MAX_DATASET_NAME_LEN];
 839  867  } dmu_objset_stats_t;
 840  868  
 841  869  /*
 842  870   * Get stats on a dataset.
 843  871   */
 844  872  void dmu_objset_fast_stat(objset_t *os, dmu_objset_stats_t *stat);
 845  873  
 846  874  /*
 847  875   * Add entries to the nvlist for all the objset's properties.  See
 848  876   * zfs_prop_table[] and zfs(1m) for details on the properties.
 849  877   */
 850  878  void dmu_objset_stats(objset_t *os, struct nvlist *nv);
 851  879  
 852  880  /*
 853  881   * Get the space usage statistics for statvfs().
 854  882   *
 855  883   * refdbytes is the amount of space "referenced" by this objset.
 856  884   * availbytes is the amount of space available to this objset, taking
 857  885   * into account quotas & reservations, assuming that no other objsets
 858  886   * use the space first.  These values correspond to the 'referenced' and
 859  887   * 'available' properties, described in the zfs(1m) manpage.
 860  888   *
 861  889   * usedobjs and availobjs are the number of objects currently allocated,
 862  890   * and available.
 863  891   */
 864  892  void dmu_objset_space(objset_t *os, uint64_t *refdbytesp, uint64_t *availbytesp,
 865  893      uint64_t *usedobjsp, uint64_t *availobjsp);
 866  894  
 867  895  /*
 868  896   * The fsid_guid is a 56-bit ID that can change to avoid collisions.
 869  897   * (Contrast with the ds_guid which is a 64-bit ID that will never
 870  898   * change, so there is a small probability that it will collide.)
 871  899   */
 872  900  uint64_t dmu_objset_fsid_guid(objset_t *os);
 873  901  
 874  902  /*
 875  903   * Get the [cm]time for an objset's snapshot dir
 876  904   */
 877  905  timestruc_t dmu_objset_snap_cmtime(objset_t *os);
 878  906  
 879  907  int dmu_objset_is_snapshot(objset_t *os);
  
    | 
      ↓ open down ↓ | 
    33 lines elided | 
    
      ↑ open up ↑ | 
  
 880  908  
 881  909  extern struct spa *dmu_objset_spa(objset_t *os);
 882  910  extern struct zilog *dmu_objset_zil(objset_t *os);
 883  911  extern struct dsl_pool *dmu_objset_pool(objset_t *os);
 884  912  extern struct dsl_dataset *dmu_objset_ds(objset_t *os);
 885  913  extern void dmu_objset_name(objset_t *os, char *buf);
 886  914  extern dmu_objset_type_t dmu_objset_type(objset_t *os);
 887  915  extern uint64_t dmu_objset_id(objset_t *os);
 888  916  extern zfs_sync_type_t dmu_objset_syncprop(objset_t *os);
 889  917  extern zfs_logbias_op_t dmu_objset_logbias(objset_t *os);
      918 +int dmu_clone_list_next(objset_t *os, int len, char *name,
      919 +    uint64_t *idp, uint64_t *offp);
 890  920  extern int dmu_snapshot_list_next(objset_t *os, int namelen, char *name,
 891  921      uint64_t *id, uint64_t *offp, boolean_t *case_conflict);
 892  922  extern int dmu_snapshot_realname(objset_t *os, char *name, char *real,
 893  923      int maxlen, boolean_t *conflict);
 894  924  extern int dmu_dir_list_next(objset_t *os, int namelen, char *name,
 895  925      uint64_t *idp, uint64_t *offp);
 896  926  
 897  927  typedef int objset_used_cb_t(dmu_object_type_t bonustype,
 898  928      void *bonus, uint64_t *userp, uint64_t *groupp);
 899  929  extern void dmu_objset_register_type(dmu_objset_type_t ost,
 900  930      objset_used_cb_t *cb);
 901  931  extern void dmu_objset_set_user(objset_t *os, void *user_ptr);
 902  932  extern void *dmu_objset_get_user(objset_t *os);
 903  933  
 904  934  /*
 905  935   * Return the txg number for the given assigned transaction.
 906  936   */
 907  937  uint64_t dmu_tx_get_txg(dmu_tx_t *tx);
 908  938  
 909  939  /*
 910  940   * Synchronous write.
 911  941   * If a parent zio is provided this function initiates a write on the
 912  942   * provided buffer as a child of the parent zio.
 913  943   * In the absence of a parent zio, the write is completed synchronously.
 914  944   * At write completion, blk is filled with the bp of the written block.
 915  945   * Note that while the data covered by this function will be on stable
 916  946   * storage when the write completes this new data does not become a
 917  947   * permanent part of the file until the associated transaction commits.
 918  948   */
 919  949  
 920  950  /*
 921  951   * {zfs,zvol,ztest}_get_done() args
 922  952   */
 923  953  typedef struct zgd {
 924  954          struct lwb      *zgd_lwb;
 925  955          struct blkptr   *zgd_bp;
 926  956          dmu_buf_t       *zgd_db;
 927  957          struct rl       *zgd_rl;
 928  958          void            *zgd_private;
 929  959  } zgd_t;
 930  960  
 931  961  typedef void dmu_sync_cb_t(zgd_t *arg, int error);
 932  962  int dmu_sync(struct zio *zio, uint64_t txg, dmu_sync_cb_t *done, zgd_t *zgd);
 933  963  
 934  964  /*
 935  965   * Find the next hole or data block in file starting at *off
 936  966   * Return found offset in *off. Return ESRCH for end of file.
 937  967   */
 938  968  int dmu_offset_next(objset_t *os, uint64_t object, boolean_t hole,
 939  969      uint64_t *off);
 940  970  
 941  971  /*
 942  972   * Check if a DMU object has any dirty blocks. If so, sync out
 943  973   * all pending transaction groups. Otherwise, this function
 944  974   * does not alter DMU state. This could be improved to only sync
 945  975   * out the necessary transaction groups for this particular
 946  976   * object.
 947  977   */
 948  978  int dmu_object_wait_synced(objset_t *os, uint64_t object);
 949  979  
 950  980  /*
 951  981   * Initial setup and final teardown.
 952  982   */
 953  983  extern void dmu_init(void);
 954  984  extern void dmu_fini(void);
 955  985  
 956  986  typedef void (*dmu_traverse_cb_t)(objset_t *os, void *arg, struct blkptr *bp,
 957  987      uint64_t object, uint64_t offset, int len);
 958  988  void dmu_traverse_objset(objset_t *os, uint64_t txg_start,
 959  989      dmu_traverse_cb_t cb, void *arg);
 960  990  
 961  991  int dmu_diff(const char *tosnap_name, const char *fromsnap_name,
 962  992      struct vnode *vp, offset_t *offp);
 963  993  
 964  994  /* CRC64 table */
 965  995  #define ZFS_CRC64_POLY  0xC96C5795D7870F42ULL   /* ECMA-182, reflected form */
 966  996  extern uint64_t zfs_crc64_table[256];
 967  997  
 968  998  extern int zfs_mdcomp_disable;
 969  999  
 970 1000  #ifdef  __cplusplus
 971 1001  }
 972 1002  #endif
 973 1003  
 974 1004  #endif  /* _SYS_DMU_H */
  
    | 
      ↓ open down ↓ | 
    75 lines elided | 
    
      ↑ open up ↑ | 
  
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX